SPECIAL TOPICS 
IN SUPERCOMPUTING 


Volume 6 


Series Editors: 


G. RODRIGUE 
University of California at Davis 
Lawrence Livermore National Laboratory 
Livermore, CA, U.S.A. 


G. MICHAEL 


Lawrence Livermore National Laboratory 
Livermore, CA, U.S.A. 


eel 


NORTH-HOLLAND 
AMSTERDAM * LONDON * NEW YORK + TOKYO 


A COMPARATIVESTUDY 
OF PARALLEL 
PROGRAMMING LANGUAGES: 
THE SALISHAN PROBLEMS 


Edited by 


John T. FEO 
Computer Research Group 
Lawrence Livermore National Laboratory 
Livermore, CA, U.S.A. 


1992 


NORTH-HOLLAND 
AMSTERDAM * LONDON * NEW YORK » TOKYO 


ELSEVIER SCIENCE PUBLISHERS B.V. 
Sara Burgerhartstraat 25 
P.O. Box 211, 1000 AE Amsterdam, The Netherlands 


ISBN 0 444 88135 2 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 


No part of this publication may be reproduced, stored in a retrieval system, or transmitted, 

in any form or by any means, electronic, mechanical, photocopying, recording or other- 

wise, without the prior written permission of the publisher, Elsevier Science Publishers 

B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, 
The Netherlands. 


Special regulations for readers in the U.S.A. - This publication has been registered with the 

Copyright Clearance Center Inc. (CCC), Salem, Massachusetts. Information can be 

obtained from the CCC about conditions under which photocopies of parts of this publi- 

cation may be made in the U.S.A. All other copyright questions, including photocopying 

outside of the U.S.A., should be referred to the copyright owner, Elsevier Science 
Publishers B.V., unless otherwise specified. 


No responsibility is assumed by the publisher for any injury and/or damage to persons or 

property as a matter of products liability, negligence or otherwise, or from any use or 

operation of any methods, products, instructions or ideas contained in the material herein. 
pp. 217-262: Copyright not transferred. 


This book is printed on acid-free paper. 


Printed in The Netherlands 


Introduction to the Series 


Large scale computing is a growing field of research that plays a vital role in 
the advancement of science, engineering, and modern industrial technology. 
Computing is fast becoming the most frequently used technique to explore 
new questions. In just the last few years, the inclusion of computer model- 
ing has produced results that were inconceivable a decade ago. They are an 
indispensible tool in many areas; from climate studies to chemical dynamics, 
from automated manufacturing to operating hospital intensive care units. 
Computer simulations of scientific processes provide, in many cases, sub- 
stitutions for actual experiments. These simulations are less expensive and 
can address a wider range of problems. Computer simulations also provide 
an understanding of physical problems that cannot be obtained from experi- 
ments alone. 

Research problems of many sorts are now becoming increasingly dependent 
on computer models, and numerical experiments are taking their place along- 
side the more traditional methods of research. Along with the theoretical 
and experimental, there is now a computational aspect of science. 


Increasing sophistication in research has led to a need for bigger and faster 
computers; for Supercomputers. In this quest, supercomputers are them- 
selves stimulating the redevelopment of the methods of computation. Results 
in one area are quickly adapted for another. The effect is making super- 
computation a multi-disciplinary adventure. Research scientists in super- 
computing come from a variety of interests and backgrounds and can be 
found in all universities, laboratories and industries. 

In supercomputing, a large overlap is found between the academic areas of 
engineering and physical sciences and the academic areas of mathematics, 
computer science, and computer engineering. The scientific jargons used in 
each of these areas are different and require translation or understanding 
before being able to make progress in the supercomputing sciences. 

Although many advances have made the process easier in many cases, it has 
not kept pace with the dramatic increase in demands placed on the computer 
as well as the growing complexity of the computer hardware that has occurred 
over the past decade. 


Special Topics in Supercomputing will take on two directions. First, in 
recognition of the fact that research in supercomputing is constantly embark- 
ing in new directions, a part of the series will be devoted to topics that have 
just begun to solidify as a well-defined research area. These volumes will 
contain manuscripts from researchers who are current leaders in the field. 


vi Introduction to the Series 


Second, Special Topics in Supercomputing will include as part of its series 
a collection of monographs and contributed volumes on new- and well- 
established areas of supercomputing. As certain topics of supercomputing 
begin to solidify on a firm theoretical foundation, they will be coalesced into 
monograph or textbook form by author(s) who are experienced in the field. 
On the other hand, important areas of supercomputing are so new that few 
manuscripts can be collected to warrant a full-scale book. These topics will 
then be included in the contributed volumes of the series. 


A hope of this enterprise will be to make supercomputing more widely under- 


stood, and more accessible in fields where it has not yet penetrated because 
of insufficient information. 


G. Rodrigue 


Preface 


As execution speeds reach the physical limits of single cpu com- 
puters, our only hope of achieving greater computing power is parallel 
systems. An area of intense research is parallel programming lan- 
guages. Researchers have proposed countless numbers of new pro- 
gramming languages, extensions to existing languages, and program- 
ming tools. Unfortunately, the research is confined primarily to 
academia with little input from the user community. The differences, 
similarities, strengths, weaknesses, and appropriate problem domains 
of the various approaches are subtle and often not well understood. 
Given this confusion, an informed comparison of parallel languages is 
difficult. 


In this book we use a basis of comparison that both the language 
and computation communities can understand and appreciate. We 
compare eight parallel programming languages based on solutions to 
four problems. Each chapter includes a description of the language’s 
philosophy, semantics and syntax, and a solution to each problem. 
The chapters are written by recognized experts; in some cases, the 
author is the language designer. We believe that by discussing solu- 
tions rather than language features or theoretical properties, we may 
bridge the gap between the language specialists and users. 


This book is appropriate as a supplementary text for a graduate 
class in parallel programming languages. Since the book approaches 
the study of languages from the standpoint of computations, both 
computer science and computational science graduate students will 
benefit from the book. 


We invite the proponents of other parallel languages to publish 
their own solutions to the Salishan Problems in other forums. This 
book includes only a subset of the classes of parallel programming lan- 
guages. We would like to see solutions in all classes of languages, and 


viii Preface 


in as many instances of each class as possible. If a real implementation 
of the language exists, we recommend that the author discuss perfor- 
mance. In the eyes of most application programmers, a language is 
only as good as its performance. We also encourage scientists in all 
fields to suggest new problems whose computational models are not 
already included. Our hope is that the set of problems will become 
recognized as a standard by which to compare parallel programming 
languages, just as the Livermore Loops and Linpack are used to evalu- 
ate computer systems. 


John Feo 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
1992 Elsevier Science Publishers B.V. 1 


The Salishan Problems 


John Feo 
Computing Research Group, L-306 
Lawrence Livermore National Laboratory 
Livermore, CA 94550 


L Introduction 


Comparisons of parallel programming languages are more often 
based on theoretical criteria than practical ones. Traditional studies 
have compared parallel languages with respect to argument passing, 
evaluation order, support for concurrency, synchronization and com- 
munication methodologies, and support for operations on specific data 
structures such as arrays, lists, or trees. Typically, the studies use ar- 
tificial or trivial program examples to illustrate the language features. 
They rarely discuss how the different approaches affect the develop- 
ment of real applications, leaving it to the reader to infer the effects of 
the language features on his or her problem. 


How many talks have we attended in which the speaker drones on 
for forty minutes describing the syntax of yet another parallel language 
and then exhibits the code for factorial, Fibonacci, or the Eight Queens 
problem? The lecturer leaves the application programmer with little 
sense of the real advantages and disadvantages of the language, what 
types of parallel programs the language can and cannot express effi- 
ciently, and even whether or not the language can express a particular 
algorithm. By describing the trees and not the forest, and by using ir- 
relevant examples, the speaker often fails to convince the listener to 
try the new language. 


2 J. T. Feo 


Consequently, the development of parallel programming languages 
is confined primarily to academic circles with little input from the 
user community. The latter are perplexed by the academic arguments 
used to promote the languages. They perceive the language designers 
as more interested in designing erudite linguistic properties than the 
functionality required to support large-scale applications. On the 
other hand, application programmers have been slow to consider new 
ideas and new programming styles. They have been reluctant to try 
new languages, thereby losing an opportunity to influence their design. 
They continue to write in sequential languages and dream of automatic 
parallelizing compilers transforming their “dusty deck” codes for each 
new class of parallel machines. 


Recognizing the abyss that has existed between the language and 
application communities, the 1988 Salishan High-Speed Computing 
Conference included four sessions on parallel programming languages. 
The Conference is a yearly meeting sponsored by Lawrence Livermore 
National Laboratory and Los Alamos National Laboratory. It is held at 
the Salishan Lodge in Gleneden Beach, Oregon. The Conference pro- 
vides a forum for architects and users of high-speed supercomputers 
to meet and influence each other’s work. Attendees include computer 
and physical scientists from academia, industry, and the national labo- 
ratories. 


Wishing to avoid the “typical” programming language talk, the 
Program Committee adopted a controlled format [3]. It defined four 
nontrivial problems (the Salishan Problems), and invited five speakers 
to present solutions to the problems in different parallel languages. 
The Committee set the solutions to the problems as the basis for com- 
parison. The problems encompassed a variety of programming models 
and data structures, including: dynamic tasks, synchronous tasks, 
asynchronous tasks, nested loop parallelism, arrays, streams, and re- 
cursive tree structures. 


The Salishan Problems 3 


Each speaker had five fifteen minute periods in which to describe 
his language and his solution to each problem. The speakers were in- 
structed to discuss philosophy, rather than syntax. They were told to 
relate the good and the bad, what was easy and was difficult, what 
types of applications the language expressed well and which it ex- 
pressed poorly. Since some of the languages lacked real implementa- 
tions, performance was not to be an issue. Given the “vocal” nature of 
Salishan audiences, few speakers got away with anything. In general, 
the sessions were considered a success. The presentations sparked 
much discussion at the evening receptions in the Sunset Suite. 


This book is a consequence of that success. It includes eight chap- 
ters, each describing a different parallel programming language and its 
solutions to the Salishan Problems. The languages are: Ada, C*, Has- 
kell, Id, Occam, Program Composition Notation, Scheme, and Sisal. 


Developed by the Department of Defense, Ada is a high-level imper- 
ative programming language designed to support modern software 
engineering techniques. C* and Occam are language abstractions of 
parallel computer architectures. The former supports data parallel, 
SIMD computing, whereas the latter supports Communicating Se- 
quential Processes. Haskell, Id, and Sisal are functional languages. 
Based on the principals of mathematics, they hide the details 
(concurrency, communication, and synchronization) of parallel pro- 
cessing. Scheme is a derivative of Lisp that includes both functional 
and imperative constructs. At some universities, it is the first pro- 
gramming language taught to students. PCN is a notation for compos- 
ing programs written in a set of base languages. It provides a frame- 
work for program development from problem specification to efficient 
source code for the target machine. 


The solutions, and not the features of the languages, form the ba- 
sis of comparison. We encourage the reader to study each solution 
carefully. It is important to understand both the code and the design 
process. In the next section, we give the instructions to the authors. 


4 J. T. Feo 


We ask that all authors presenting solutions to the Salishan Problems 
follow these instructions. In sections 3, 4, 5, and 6, we define the 
four Salishan problems. 


2. Instructions to the authors 


Your should begin by discussing the reasons your language was de- 
veloped, the types of computations and the forms of parallelism it can 
and cannot express, and its advantages and disadvantages from a pro- 
grammer’s point of view. You should present enough syntax to allow a 
knowledgeable reader unfamiliar with your language to understand 
your solutions. You should then give a solution and a critical analysis of 
your solution for each problem. 


You are free to choose any algorithm you wish, but be warned that 
an inefficient algorithm makes a language look bad. Readers will infer 
that either the language is unable to express the more efficient algo- 
rithm or the language’s character is such that it naturally suggests the 
less efficient solution. Be sure to discuss what about the problem is 
easy to express and what is difficult. Again, use the “best” algorithm 
for your language. 


You should write complete solutions; do not bury nonstandard oper- 
ations in library routines. For example, if your language does not have 
streams but you want to use them in a solution, please give the state- 
ments defining streams in your language and the code for the stream 
operations you use. Design the routines for each problem as if you 
were designing them for a library. Do not include I/O. 


One final point: you must present both the good and the bad. There 
are no perfect programming languages and no perfect solutions. All 
languages have flaws and all impose restrictions on the solution space. 
We insist you discuss these. 


The Salishan Problems 5 


3 Hamming’s Problem (extended) 


Given a set of primes (a, b, c, . . .}, of unknown size and an integer 
n, output in increasing order and without duplicates all integers of the 
form 


adbeb/ecke...csn 


Observe that if ris in the output stream then, 


1A 
a 


aer, ber, cer,.. 
are also in the output stream. 


The problem tests a language’s ability to express recursive stream 
computations and producer/consumer parallelism, and to support dy- 
namic task creation. 


4. Paraffins Problems 


Given an integer n, output the chemical structure of all paraffin 
molecules for is n, without repetition and in order of increasing size. 
Include all isomers, but no duplicates. The chemical formula for 
paraffin molecules is CHoj,9. You may choose any representation for 
the molecules you wish, so long as it clearly distinguishes among iso- 
mers. 


The problem is discussed in [4], but the solution is inefficient. A 
more efficient algorithm exists based on the theory of free and ori- 
ented trees [2]. The problem addresses the representation of recur- 
sive tree structures, the creation and manipulation of those structures, 
nested loop parallelism, and, if Turner’s solution is used, pattern 
matching and set operations. 


6 J. T. Feo 


5. The Doctor's Office 


Given a set of patients, a set of doctors, and a receptionist, model 
the following interactions: initially, all patients are well, and all doc- 
tors are in a FIFO queue awaiting sick patients. At random times, pa- 
tients become sick and enter a FIFO queue for treatment by one of the 
doctors. The receptionist handles the two queues, assigning patients 
to doctors in a first-in-first-out manner. Once a doctor and patient are 
paired, the doctor diagnoses the illness and cures the patient in a 
random amount of time. The patient is then released, and the doctor 
rejoins the doctor’s queue to await another patient. The output of the 
problem is intentionally unspecified. The problem is adapted from (3]. 


This is neither an event-driven nor a time-driven simulation. 
There is no global knowledge, no knowledge of when events will oc- 
cur, no global clock, and no global communication. You may use any 
method you wish to decide when a patient becomes sick and how long 
a patient sees a doctor. The interactions of the patients, doctors, and 
receptionist should be true to life. Our intent is to test each lan- 
guage’s ability to program a set of concurrent, asynchronous processes 
with circular dependencies. 


6 Skyline Matrix Solver 
Solve the system of linear equations, 
Ax=bD 


without pivoting where A is an n by n skyline matrix. A skyline matrix 
has nonzero values in row iin columns k through i, 1 < k <i, and 
nonzero values in column j in rows k through j, 1 <k sj. The values of 
k are stored in two vectors: row and column. For example, if 


The Salishan Problems 7 


" 
20 89 Oo Oo Otr 


then row = [1, 2, 2, 4, 2, 4, 7] and column = [1, 2, 3, 1, 3, 2, 7]. You 
may assume any input form you wish for A, b, row, and column. 


The problem was originally discussed in [1] and suggested to us by 
Alan Hindmarsh. We include the problem to test the ability of each 
language to define array structures that include nonessential elements 
{i.e., the zeros), and given those structures, support parallel and itera- 
tive array computations efficiently (i.e., avoid the computations involv- 
ing the zeros). 


Acknowledgements 


I am indebted to the authors who contributed to this book. They 
have done an outstanding job. I especially appreciate that they fol- 
lowed instructions and solved the problems as specified. If they had 
been politicians, they each would have solved a different problem—the 
one they wanted to solve instead of the one asked. I would also like to 
thank the 1988 Salishan Conference Program Committee for trying 
something new. The language and application programming commu- 
nities must work together to develop the next generation. of pro- 
gramming languages. They must talk to one another. 


This work was supported (in part) by the Applied Mathematics 
Program of the Office of Energy Research (U.S. Department of Energy) 
under contract No. W-7405-Eng-48 to Lawrence Livermore National 
Laboratory. 


8 J. T. Feo 


Disclaimer 


This report was prepared as an account of work sponsored by the United States 
Government. Neither the United States nor the United States Department of Energy, 
nor any of their employees, nor any of their contractors, subcontractors, or their em- 
ployees, makes any warranty, express or implied, or assumes any legal liability or re- 
sponsibility for the accuracy, completeness or usefulness of any information, appara- 
tus, product or process disclosed, or represents that its use would not infringe privately- 
owned rights. 


References to a company or product name does not imply approval or recommenda- 
tion of the product by the University of California or the United States Department of 


Energy to the exclusion of others that may be suitable. 


The views, opinions, and/or findings contained in this report are those of the au- 
thors and should not be construed as an official Department of the Army position, pol- 
icy, or decision, unless so designated by other documents. 


References 


1. Eisenstat, S. C. and A. H. Sherman. Subroutines for envelope solu- 
tion of sparse linear systems. Research Report 35, Yale University, 
New Haven, CT, October 1974. 


2. Knuth, D. The Art of Computer Programming, Vol. 1: Fundamental 
Algorithms. Addison-Wesley, Reading, MA, 1973. 


3. McShea, M. K. Evaluation of Parallel Programming Languages. M.A. 
thesis, Department of Computer Science, The University of Texas 
at Austin, 1986. 


4, Turner, D. A. The semantic elegance of applicative languages. In 
Proceedings of the Conference on Functional Programming 
Languages and Computer Architecture, Portsmouth, NH, October 
1981, 85-92. 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 9 


Ada Solutions to the Salishan Problems 


Kenneth W. Dritz 
Mathematics and Computer Science Division 
Argonne National Laboratory 
Argonne, Illinois 60439-4801 


This chapter presents, in Sections 2 through 5, revised Ada solu- 
tions to the four challenge problems featured in the Language Sessions 
of the Conference on High-Speed Computing held in Gleneden Beach, 
Oregon in March 1988. The chapter begins with an overview of Ada, 
emphasizing features that are noteworthy in the problem solutions. 


L Language Features Relevant to the Salishan Problems 


Ada is a high-level procedural language, with strong typing and 
other features common in languages of its genre, as well as with sev- 
eral innovations. The language has had an influence in such related 
areas as program development environments and software life-cycle 
design methodology, so that in a broad sense one can talk about “the 
Ada movement.” In this section, we cannot hope to provide a com- 
plete introduction to the language or to the modern software engi- 
neering principles that it supports especially well. For general back- 
ground, we refer the reader to any of the many excellent textbooks on 
Ada; for the details, the language reference manual is indispensable, 
and the textbooks are also valuable. Here, we will concentrate on 
those features particularly relevant to the Salishan Problems—namely, 
the tasking features that support concurrency and some of those sup- 
porting data structuring and abstraction. The language features de- 
scribed here, and used in the problem solutions, are those of Ada as 
standardized in 1983 [12]. The revision currently in progress, known 


10 K. W. Dritz 


informally as Ada 9X [1], can be expected to allow some aspects of our 
solutions to be more efficient and to be expressed more elegantly. 


As is well known, the language was created in response to the 
“software crisis” in the U. S. Department of Defense, a crisis charac- 
terized by rapid escalation of the cost of developing, maintaining, and 
adapting software while hardware costs were decreasing [2, 3]. Part of 
the problem lay in the programming languages being used by the de- 
fense services, and especially the number of those languages and the 
lack of interoperability among them. Another major part of the prob- 
lem was the inadequate support for modern principles of software 
engineering, both in the languages themselves and in the program- 
ming environments surrounding them. By mandating a single high- 
level language designed to support such principles as data abstraction, 
information hiding, both top-down and bottom-up design, secure pro- 
gramming, and reusability, the DoD expected to reduce the cost of 
software and increase its reliability, significantly. 


1.1. Tasking Features 


Much DoD software is for embedded systems, which have unique 
requirements for low-level I/O, asynchronous interrupts, parallelism, 
and real-time control. Tasking features were mandated for the new 
language by the Steelman requirements [11] in 1978. A high-level 
model of parallelism was adopted; based essentially on CSP [7], it em- 
ploys the concept of a rendezvous between tasks both for synchroniza- 
tion and for communication. Ada successfully marries the high-level 
model of concurrency with low-level features; for example, asyn- 
chronous interrupts, whose behavior is defined in terms of “virtual” 
tasks in the high-level model, bring certain low-level features—such as 
representation clauses—into play. Our interest here is not in these 
low-level features, but rather in the high-level model of tasking. 


Ada 11 


Ada programs are strictly serial except where tasking is explicitly 
used to express concurrency. (Ada 9X, however, is likely to allow 
some degree of implicit parallelism so that, for example, vector archi- 
tectures can be exploited.) If a multiprocessor or distributed imple- 
mentation of Ada is available, concurrency—appropriately used—can 
provide some speedup. A more important role for concurrency, how- 
ever, is as a cognitive tool, like recursion; in that role, it can simplify 
the expression of certain algorithms—particularly nondeterministic 
ones, for which it is well suited. (Failure to exploit nondeterminism 
by specifying an ordering where none is needed is an error of over- 
specification; it can lead to inefficiency or, in some cases, to incom- 
pleteness.) The goal of achieving speedup, when the hardware per- 
mits it, should be secondary; indeed, concurrent Ada programs can be 
written without regard to the hardware on which they will eventually 
be run, because they will in fact run (without speedup) on uniproces- 
sors, where the implementation of Ada achieves the illusion of concur- 
rency by interleaving or time-slicing among the Ada tasks eligible for 
execution. 


The rendezvous model of tasking in Ada is at a higher level than 
that of monitors [6], which are favored by some researchers for the 
ease with which they can be efficiently implemented using lightweight 
hardware locks. Of course, implementation of the rendezvous mech- 
anism typically makes use of locks and locking internally, out of sight 
of the application programmer. And realization of the monitor para- 
digm within an Ada program, when desired for its logical properties, 
is easily achieved with the rendezvous mechanism. Unfortunately, be- 
cause of the richness of the full semantics of Ada tasking, implemen- 
tation of the rendezvous mechanism has not heretofore been efficient 
enough to allow monitors coded in Ada to compete favorably with 
those handcrafted directly out of locks and locking operations in other 
languages, or with those provided as primitive operations in other lan- 
guages (without all the semantic richness of Ada). Increasingly, how- 
ever, compiler vendors are making their compilers smart enough to 


12 K. W. Dritz 


recognize when the rendezvous (without the more costly features of 
the language) is used in a monitor metaphor, and by implementing the 
so-called “Habermann-Nassi optimization” [5] they are specializing the 
code generation in that case to match the best implementations of 
monitors in other languages. Nevertheless, new features are antici- 
pated in Ada 9X to provide for efficient mutually exclusive accesses to 
shared data by concurrent tasks. 


A task is a kind of Ada program unit like a procedure, except that it 
has a separate thread of control and cannot have formal parameters. 
Like other program units, tasks come in two separate parts—a speci- 
fication and a body. A task’s specification serves to define the visible 
interface between the task and other program units that might inter- 
act with it, while its body expresses its logical behavior. Tasks can be 
defined either as individual objects (single tasks) or as types: in the 
latter case, any number of identical objects of the type can be created, 
perhaps as components of composite objects (such as arrays).! Tasks, 
like other objects, can be created either by elaborating (giving effect 
to) object declarations upon entering the scope of those declarations, 
or by evaluating allocators (the mechanism for dynamic allocation in 
Ada); they start executing upon creation.2 In the doctor’s office prob- 
lem, for example, the receptionist is modeled as a single task; the pa- 
tients are modeled as an array of tasks of some appropriate task type; 
and the doctors are modeled as individually created objects of another 
task type, using allocators. (The reason that two different techniques 
were used to manage these collections is discussed in the problem 
solution.) 


Tasks that work on disjoint sets of data and that never cooperate 
are uninteresting. More typically, tasks either share global data and 
synchronize (serialize) their references to them, or they own data 
outright and rely on the queuing and mutual exclusion of other tasks’ 
interactions with themselves, plus the single thread of control within 
themselves, to serialize references to the data. In the latter case, val- 


Ada 13 


ues usually are transmitted between interacting tasks. The synchro- 
nization and the communication aspects of the interaction are com- 
bined in the rendezvous mechanism. 


Once created, tasks execute in parallel with the rest of the program 
until they terminate or reach some kind of synchronization point—for 
example, that determined by a rendezvous with another task. 


In the simplest kind of rendezvous, the “calling task” makes an en- 
try call to an entry of the “called task.” (Entries of a task are declared 
in its specification.) An entry call has the syntax of a procedure call, 
except that the name involved is an expanded name (for example, t .e) 
combining the name of the called task (t) and that of an entry (e) 
within it. Like a procedure call, an entry call can optionally include 
actual parameters, so it can pass values to, and receive them from, the 
called task. 


The called task’s role in a simple rendezvous is expressed by an 
accept statement, an executable statement whose syntax is depicted in 
Figure 1.3 


accept <entry-name> [(<formal-parameter-list>)] [do 


<statement-list> 
end [<entry-name>]]; 


Figure 1 - Syntax of the accept statement 


If the accept statement has formal parameters, their scope extends 
only to the end of the statement {i.e., to the semicolon in Figure 1). 
The semantics of a rendezvous are as follows. The first of the two 
tasks to execute its part of the rendezvous (that is, either the entry 
call in the calling task or the accept statement in the called task) be- 
comes suspended until the other task executes its corresponding part. 
At that point, if the statement list is present, the calling task becomes 
suspended (if it is not already so), and the called task executes the 
statement list; the statement list may, of course, read the formal pa- 


14 K. W. Dritz 


rameters of mode in or in out, whose values have been transmitted 
from the calling task, and store into the formal parameters of mode 
out or inout. Finally, when the end of the statement list is reached 
(or in its absence), both tasks continue in parallel—the calling task 
with the statement after its entry call and the called task with the 
statement after its accept statement. On the calling side, the actual 
parameters that had been associated with formal parameters of mode 
out or in out have, henceforth, the new values they received during 
the rendezvous. 


If multiple tasks call a given entry of some other task, the calls 
remain queued in FIFO order, and each execution of an accept state- 
ment for the entry removes the oldest call from the queue. 


Conditional entry calls and accept statements (those that take an 
alternate path instead of waiting, if their counterparts have not already 
been executed) and timed entry calls and accept statements (those 
that time out and take an alternate path if their counterparts are not 
executed within a specified time) can be constructed, but we do not 
need them for the problem solutions. And entries of a task can be ag- 
gregated into entry families, which behave like arrays of entries and 
provide for the identity of an entry named in an entry call or accept 
statement to be computed (e.g., by the value of its subscript expres- 
sion); the problem solutions do not require entry families, either. 


Finally, tasks can be made to accept entry calls at any of several al- 
ternative entries. To achieve that effect, the called task executes what 
is called a selective wait, the syntax of which is illustrated (by means of 
a skeletal example) in Figure 2. If one or more of the entries named 
in the accept statements have already been called when the task 
reaches its selective wait, then one of those calls is chosen (in a 
manner not defined by the language), and a rendezvous is performed 
with the task making the call; at the conclusion of the rendezvous, the 
statement list (if any) following the accept statement that participated 
in the rendezvous is executed in parallel with calling task. 


Ada 15 


select 
<accept-statement> 
[<statement-list>] 
or 


<accept-statement> 
(<statement-list>] 


Figure 2 - Syntactic example of a selective wait 


On the other hand, if no task has yet called any of the entries named 
in the accept statements, the task executing the selective wait 
becomes suspended until some task calls one of those entries, at 
which time it performs a rendezvous with that task. Other options can 
be used to add the conditional or timed behavior to a selective wait. 


The branches of a selective wait are called selective wait alterna- 
tives, of which there are several kinds. Those shown in Figure 2 are 
accept alternatives. Any of the selective wait alternatives can be pre- 
fixed by a guard containing a Boolean expression. When the task 
reaches a selective wait, all of the guards are evaluated first. A selec- 
tive wait alternative whose guard evaluates to true, or one without a 
guard, is said to be “open.” Only the open alternatives are considered 
further in the execution of the selective wait. The Boolean expres- 
sions in guards typically involve task-local “state” variables. The use of 
guards is an essential ingredient in the realization of monitors that can 
suspend a task requesting a monitor operation, when that operation 
cannot be performed until some other task first performs a comple- 
mentary monitor operation. (An example of such a monitor might be 
one that buffers items among a collection of tasks. The operation of 
delivering an item from the buffer to a requesting task cannot be per- 
formed when the buffer is empty; it must wait until some other task 
first performs the complementary operation of placing an item into 


16 K. W. Dritz 


the buffer. A guard would be used to inhibit the acceptance of an entry 
call requesting delivery of an item when the buffer is empty.) 


This capability of waiting for interactions with any of several other 
tasks is crucial to the functioning of the solution to the doctor’s office 
problem. There, the receptionist must wait for requests by free doc- 
tors for sick patients, or notifications by patients that they are sick 
and in need of a doctor. These interactions can occur in an arbitrary 
and unpredictable order, and the receptionist must respond to each as 
it occurs. Guards play a crucial role in that solution, too; they are used 
to defer the receptionist’s interactions with free doctors until it 
knows that sick patients exist. Much of the desired behavior of the 
problem solution, including the queuing of doctors and patients until 
they can be paired up, is obtained for free from the semantics of the 
Ada tasking features. 


Termination of tasks in Ada is a deep subject unto itself, and we 
will give here only the briefest sketch of the possibilities. One way 
that tasks can terminate is by reaching the end of their execution (but 
they will wait at that point until all of their subtasks, if any, have ter- 
minated). A further way, expressed by another kind of selective wait 
alternative {a terminate alternative), is more appropriate for tasks, 
such as monitors, that repetitively perform services for other tasks as 
long as those services are needed. Informally, this latter mechanism 
allows a service task to terminate, instead of waiting for a future entry 
call, when the receipt of an entry call from any task that could interact 
with it is no longer possible—that is, when all such tasks have either 
already terminated or are similarly stating their willingness to termi- 
nate instead of taking another entry call. Typically, when one task 
terminates by this mechanism, others (with which it could interact) 
do so at the same time, hence the name “distributed termination” for 
this mechanism. Distributed termination happens not to be used in 
the problem solutions. 


Ada 17 


Although the morphological similarity of tasks and procedures has 
already been mentioned, it is clearly not correct to think of a whole 
task as the smallest unit of parallel execution. Tasks typically alternate 
between periods of suspension and execution, and the latter can be 
made as large or as small as is appropriate to the application at hand. 
It is also a mistake to think that Ada tasks are bound in a fixed way to 
processors or nodes of a multiprocessor system, with those processors 
or nodes frequently becoming blocked. Ada implementations typically 
maintain a queue of Ada tasks that are eligible for execution (not sus- 
pended), and each processor takes a task from that queue when its 
previous task terminates or becomes suspended. In some cases, the 
redirection of a processor's attention from one task to another can 
even be performed without a context switch (as, for example, with 
Habermann-Nassi optimization). 


A frequent criticism of Ada is its wordiness in comparison to other 
languages. In our context this shows up, for example, in the repetition 
of parameter profiles for task entries in both the specification and the 
body of a task. (The same kind of repetition occurs in subprogram 
specifications and bodies.) This particular kind of repetition can be 
attributed to the requirements of a powerful language feature called 
subprogram overloading, which can play an important role in the de- 
sign of abstractions. In general, the wordiness of Ada is sometimes a 
consequence of powerful language features and sometimes a conse- 
quence of the redundancy that both tames the power and increases 
the safety of programs (by enabling extensive compile-time checking). 
Because of what it represents, most Ada programmers quickly learn to 
accept the wordiness as an asset rather than a hindrance. 


Ada appears to be a good language in which to express a solution to 
the doctor’s office problem because the actors in the problem, and the 
interactions among them, map directly onto available features of Ada. 
The pairwise interactions among the various actors are reminiscent of 
the rendezvous itself; the desired kind of queuing is obtained for free; 


18 K. W. Dritz 


and the merging of signals, as it were, is provided by the selective 
wait. The first two of the four solutions to Hamming’s problem also 
appear attractive in Ada, given the abstraction of a stream, which— 
while not built into the language—is readily programmed in Ada. 
Those solutions are comparable to others in this book, in languages in 
which streams are either built in (as in SISAL [9]) or available as a 
previously programmed reusable* abstraction. The remaining two so- 
lutions to Hamming’s problem appear less attractive, but only because 
they are considerably more involved than the usual solutions. 


1.2. Data Structuring and Abstraction Features 


Among the features useful for defining complex data structures and 
abstractions, and for making them reusable, are packages, generic 
units, private types, access types, record types with discriminant parts 
or variant parts, and subprogram or operator overloading. The Ada 
problem solutions only scratch the surface of the vast well of potential 
benefits these features offer. Opportunities for their use suggested 
themselves quite naturally during the development of the solutions. 


Packages are one of the four kinds of program units in Ada, the 
other three being subprograms (functions or procedures), tasks, and 
generic units. Packages contain collections of declarations meant to 
be used together. For example, a math library is typically constituted 
as a package containing mathematical subprograms. Abstractions are 
usually organized as packages that declare types—usually private 
types—as well as explicitly defined operations applicable to objects of 
those types. Like other program units, packages have separate 
specifications and bodies. The declarations in a package specification, 
or rather in the part of a package specification called its visible part, 
are the ones exported to the user, or client, of the package. A package 
specification can include, among other things, declarations of types, 
declarations of objects, and the specifications of other (nested) 
program units of any of the four kinds. A package body contains the 


Ada 19 


bodies of the program units whose specifications occurred in its 
specification; it may also include other declarations of types, objects, 
and program units, which are visible only there (i.e., they are not 
visible to the client of the package). Some packages do not need 
bodies; in particular, a package specification that declares only types 
and objects does not require a body. 


A package specification occurring alone as a compilation unit de- 
fines a library package; the specifications of subprograms and of 
generic units occurring alone as compilation units also define library 
units. (Tasks cannot occur alone, but must always be nested inside an- 
other program unit. This rule is connected to the semantics of dis- 
tributed termination of tasks. Tasks may, of course, be nested inside 
library packages.) A library unit is entered into a program library upon 
its successful compilation and is thereafter available to potential 
clients. A client—that is, a compilation unit that needs the facilities 
provided by the library unit—refers to the library unit in its context 
clause, which contains one or more with clauses optionally followed by 
use clauses. (We say that the compilation unit “withs” the library unit. 
In this chapter, we also say that the compilation unit is compiled “in 
the context of” the library unit.) A client of a library unit can be 
compiled before the body of the library unit is compiled; furthermore, 
the client need not be recompiled when, after the body of the library 
unit is initially compiled, it is subsequently changed. The specification 
of a library package, for example, contains all the interface information 
needed to compile references to the types, objects, and subprograms 
that the package exports. Thus, changes in the detailed implementa- 
tion of an abstraction do not require the client of an abstraction to be 
recompiled, provided that the interface remains unchanged. 


The solution to Hamming’s problem is organized as a library pack- 
age; the package exports a type, an internal package, and a procedure 
that solves the problem. To solve Hamming’s problem, the user writes 
a program unit that “withs” the Hamming’s problem package and then 


20 K. W. Dritz 


calls the procedure that it exports. To interface to that procedure, 
the user would also declare some objects of the type exported by the 
Hamming’s problem package and by its internal package. 


The solution to the paraffins problem is organized somewhat differ- 
ently, in three parts: a pair of library packages (one for radicals, the 
other for molecules) that export types and operations, and a library 
function that solves the problem. To solve the paraffins problem, the 
user writes a program unit that “withs” and calls the function. To in- 
terface to that function, the user would also “with” the molecules 
package and declare some objects of a type exported by it. Depending 
on the other needs of the user’s application, the user might also wish 
to “with” the radicals package. 


Generic units, which may be generic packages or generic subpro- 
grams, strongly support the aim of reusability by making it possible to 
parameterize packages or subprograms in a variety of ways. They serve 
as templates from which ordinary packages or subprograms are ob- 
tained by the process of instantiation, in which generic actual parame- 
ters are associated with the generic unit’s generic formal parameters. 
Generic units can be parameterized with types, computed values 
(playing the role of constants inside the generic unit), variables 
(providing aliasing between variables inside and outside the generic 
unit), and subprograms. The generic actual parameters are said to be 
“imported” into a generic unit; some of those imported by a generic 
package may be re-exported by the ordinary package obtained from it 
by instantiation. 


The solution to the doctor's office problem is organized as a 
generic library procedure that imports a pair of functions. To solve 
the doctor’s office problem, the user writes a program unit that 
“withs” the generic procedure, instantiates it with the names of a pair 
of functions (provided by the user) giving the probability distributions 
of patients’ healthy periods and doctors’ cure times, and calls the re- 
sulting ordinary procedure. Parameterization by a subprogram also 


Ada 21 


plays a minor role in the solution to the paraffins problem, where 
generic procedures are used (both as library units and internally). At 
present, generic units must be used in this way to parameterize a sub- 
program with another subprogram, but Ada 9X is expected to provide 
a way to do this without using generic units. 


The solution to the skyline matrix problem is organized as a 
generic library package importing a floating-point type and a positive 
integer value. To solve the skyline matrix problem, the user writes a 
program unit that “withs” the generic package, instantiates it with a 
floating-point type (thus customizing the problem to the required 
precision) and a positive integer value giving the order of the problem, 
declares matrices and vectors of the types exported by the resulting 
ordinary package, manipulates the matrices and vectors using opera- 
tions exported by the package, and calls the function (also exported by 
the package) to solve the matrix equation. 


In Ada, a type is viewed as a set of values and a set of operations on 
the values. Operations of a type can be declared by the user in the 
form of a subprogram and are designated either by a name or a 
(predefined) operator symbol. In addition, operations appropriate to 
the nature of a type are implicitly declared and exported with the 
type. For one example, the declaration of a floating-point type implic- 
itly declares arithmetic operations on values of the type: for another, 
the declaration of an array type implicitly declares component selec- 
tion by subscripting as an operation of the type. The user may declare 
private types, whose significance is that very few operations are im- 
plicitly exported by the type declaration; essentially, the details of the 
realization of the type are hidden from clients. A private type is de- 
fined in the visible part of a package specification. A corresponding 
full type declaration is given in another part of the package specifica- 
tion, called the private part. Beyond the few operations implicitly ex- 
ported for all private types, the only other operations available to 
clients of the package are those explicitly exported in the form of sub- 


22 K. W, Dritz 


programs declared in the package's visible part. Operations relevant 
to the full type declaration are implicitly declared by it and are visible 
in the body of the package, but not to clients of the package. The use 
of private types is often combined with the use of generics to forge 
powerful and secure abstractions. Our use of private types in the solu- 
tions is limited to a brief discussion at the end of Section 5.2 of some 
interesting possibilities in connection with skyline matrices. 
However, in several of the problem solutions we assume the existence 
of generic library packages that provide generally useful abstractions, 
and these would certainly use private types.5 


Dynamic storage allocation is provided in Ada for applications re- 
quiring the construction of complex list or tree structures. An opera- 
tion called an allocator obtains storage for an object dynamically and 
returns a pointer to the newly allocated object. Pointer values are val- 
ues of a class of types called access types. The declaration of an access 
type designates the type of the objects pointed to by values of the ac- 
cess type, upholding Ada’s principle of strong typing. We use access 
types in the solutions in several ways. In the paraffins problem, we 
construct molecules and radicals as trees of linked records, and we 
store those of a given size in linked lists. In the doctor's office prob- 
lem, we represent the doctors as dynamically allocated tasks of some 
task type, with each task accessed through a pointer serving to iden- 
tify it to the receptionist and to the patient that it treats. In the sky- 
line matrix problem, we dynamically allocate the rows and columns of 
skyline matrices to accommodate the components inside the skyline 
envelope without wasting space for the zero components outside, and 
in fact a skyline matrix is represented as an array of pointers to the 
row vectors of the lower triangle and another array of pointers to the 
column vectors of the upper triangle. 


Two features of record types play a role in data structuring and ab- 
straction in Ada: discriminant parts and variant parts. Record types 
may be declared with a discriminant part, which serves to parameter- 


Ada 23 


ize the type in certain ways. A discriminant part of a record type def- 
inition gives the names and types of special components of the record 
known as discriminants, upon which other components may depend. 
Without going into much detail, we will simply say that the discrimi- 
nants of a record type may be used to express the bounds of other 
record components that are arrays, and, in conjunction with variant 
parts, they may be used to determine which of several alternate com- 
ponents exist in the record. Objects of a record type with discrimi- 
nants are thus self-describing in terms of their layout. Depending on 
how they are declared and how they are created, such objects can ei- 
ther be constrained to have the same layout forever (i.e., the discrimi- 
nants, once initialized, are unchangeable) or be permitted to change 
their layout as a result of assignment, within the allowable range of val- 
ues of the discriminants as given by their types. In the problem solu- 
tions, we make only limited use of this facility. In the paraffins prob- 
lem, we use discriminants and variant parts to define a record type for 
paraffin radicals that can have one of two distinct layouts and a record 
type for paraffin molecules that can also have one of two distinct lay- 
outs. In the skyline matrix problem, we use discriminants to hold the 
bounds of a record component that is an array. In both cases, the ob- 
jects of the record types are dynamically allocated with constrained 
layouts. 


The final feature of interest in the problem solutions, subprogram 
and operator overloading, refers to the ability to declare several differ- 
ent subprograms with the same designator (which is either a name or 
a predefined operator symbol) but different parameter/result profiles. 
The subprogram designated by a particular name or operator symbol is 
determined during compilation by overload resolution, using the pa- 
rameter/result types and other contextual information. This feature 
supports abstraction by allowing familiar names or operator symbols to 
be extended to new data types. We use operator overloading in the 
solution to the skyline matrix problem to define an inner-product 
function for vectors, using the customary * infix operator symbol 


24 K. W. Dritz 


(which is predefined as the multiplication operator for scalars); thus, 
we overload the * operator. (We also mix named parameter associa- 
tion with the more traditional positional parameter association, par- 
ticularly when it allows a subprogram invocation or generic instantia- 
tion to be read like an English phrase.) 


In the problem solutions that follow, we discuss language features 
only to the depth required for a reader unfamiliar with the language to 
appreciate the implications, strengths, and limitations of the feature, 
and its application to the problem at hand. We make no attempt to 
describe all the features used, however. 


2 Hamming’s Problem (Extended) 


The problem is as follows: Given a finite increasing sequence 
Pi... pk Of primes and an integer n, output in order of increasing 
magnitude and without duplication all integers less than or equal to n 
of the form 


k 
[] pf 
i=] 


for e;> 0. A hint calls attention to the fact that if mis in the output 
sequence, then so is m ® p;, provided that it is less than or equal to n. 


Four solutions to Hamming’s problem, using different degrees of 
tasking, are presented. The first is straightforward and uses no task- 
ing features, while the second is a minor variation of the first that uses 
tasking in a rather artificial way to remove unnecessary determinism 
and coincidentally introduce a small amount of concurrency. The 
third implements an entirely different algorithm and uses tasking in a 
more functional way, and the fourth is an extension of the third in 
which tasks are replicated to achieve greater parallelism. While the 
fourth solution also represents an artificial improvement over its pre- 


Ada 25 


decessor, the goal and the method of the improvement are unlike that 
embodied in the second solution. 


2.1. First Solution: Usual Serial Algorithm 


The usual method of solving this problem involves several parallel 
streams of future output values, one stream corresponding to each 
prime. In each iteration of the “main loop,” a value is appended to the 
tail of each stream, a new output value is produced, and a value is re- 
moved from the head of some of the streams. More precisely, each 
time through the loop, 


e the product of the previous output value and the prime 
associated with the stream is appended to each stream; 


e the least of the values at the heads of all the streams is 
determined, and it becomes the next output value; and 


e each stream whose head matches that least value has its 
head removed. 


Streams are not predefined in Ada, but we assume the availability, 
as a library unit, of a generic package named streams that provides an 
abstract data type called stream complete with appropriate operations 
on objects of that type. Since we need streams of positive integers, we 
instantiate the generic package streams with the predefined subtype 
positive of the predefined type integer to obtain a package that we 
call positive streams; its stream type is exactly what we seek. We 
are not concerned here with the implementation of the generic pack- 
age, which might very well use pointers and linked lists, and for 
brevity we omit even its specification. We assume that the initial state 
of an object of type stream is “empty,” allowing us to dispense with 
explicit initialization of streams; this is a realistic assumption since 
access objects, i.e., pointers, are automatically initialized to null in 
Ada. In this application, we require only the following operations on 
streams, which we assume are provided by the generic package: a pro- 


26 K, W. Dritz 


cedure (append) to append a value to the tail of a stream: a function 
(head) to return the value at the head of a nonempty stream without 
removing it; and a procedure (behead) to remove and discard the value 
at the head of a nonempty stream. The subprograms head and behead 
can be assumed to raise some exception if applied to an empty stream. 
We have no need for an is_empty predicate; and because we apply 
head and behead only to nonempty streams, we can guarantee that the 
exception will never be raised. 


Our first solution to Hamming’s problem is in the form of a pack- 
age, hammings_problem, that exports a type for unconstrained arrays 
(those not bearing bounds) of positive integers, a package of types and 
operations associated with streams of positive integers, and a proce- 
dure to compute and deliver the required sequence of values given a 
maximal output value and an array of primes in ascending order. The 
array type and the package of stream types and operations are ex- 
ported with the procedure so that they can be used in forming and 
manipulating the procedure’s inputs and outputs (we have chosen to 
deliver the sequence of output values by appending them to an in out 
parameter which is a stream initially passed in as an empty stream). 
The specification of the hammings_problem package is shown in Figure 
3.6 


with streams; 


package hammings problem is 


type positive_array is array (integer range <>) of positive; 


package positive_streams is new streams (positive); 

use positive_streams; 

procedure solver (n : ain positive; 
primes : in positive _array; 


outputs : in out stream); 


Figure 3 - Specification of the hammings_ problem package 


The body of the hammings_problem package is shown in Figure 4. 
The functions min and max are not predefined in Ada, but we assume 


Ada 27 


the availability, as a library unit, of a generic package named 
min_and_max, which we instantiate with the subtype positive. The 
result of the instantiation is a package that we call posi- 
tive_min_and_max; it exports min and max functions on positive inte- 
gers. We are not concerned here with the implementation of the 
generic package, and for brevity we omit even its specification. 


with min_and_max; 


package body hammings_problem is 
package positive _min_and_max is new min_and_max (positive); 
use positive _min_and_max; 
procedure solver (n : ain positive; 
primes : in positive _array; 


outputs : in out stream) is separate; 


Figure 4 - Body of the hammings_problem package (first solution) 


A place for the body of the solver procedure is held by a body stub, 
and the proper body of solver is made a subunit of hammings_problem. 
This allows us to keep our compilation units small. The need for that 
economy is not especially strong now, but the solutions to be exhibited 
subsequently do definitely benefit from this practice. The proper body 
of solver, as a separate compilation unit, is shown in Figure 5. 


The solution shown in Figure 5 is somewhat wasteful in that it ap- 
pends values exceeding n, the maximal output value, to the streams, 
even though they will never become part of the output sequence; they 
remain in the streams when the main loop terminates. Worse, it lacks 
robustness because the very computation of such values could theoret- 
ically overflow, bringing the program to a grinding halt. These prob- 
lems could be easily corrected, but the simplicity of the algorithm 
would be obscured by the added detail. 


28 K. W. Dritz 


(hammings_problem) 
(n 
primes : in 
outputs in out stream) is 
constant positive :=n + 1; 
(primes'range) of stream; 


separate 
Procedure solver 


positive; 
positive _array; 


in 


infinity : 
array 


pipes 
: positive := 1; 


1 
new_i : positive; 
begin 
while i <= n loop 
append (i, to => outputs); 
new_i := infinity; 
for k in primes'range loop 


append (i*primes(k), to => pipes(k)); 
min(new_i, head(pipes(k))); 


new i := 
end loop; 
new_i; 
for k in primes'range 
iff head(pipes (k) ) 
behead (pipes (k)); 
end if; 
end loop; 
end loop; 


end; 
Figure 5 - Proper body of the solver procedure (first solution) 


2.2. Second Solution: Processing the Streams in Parallel 


In each iteration of the main loop, the preceding solution cycles 
through the streams in order. The order of visiting the streams is im- 
material, of course, and in fact they could be processed in parallel. 
Thus, that solution is actually overspecified. 


For our second solution, a modified proper body of the solver pro- 
cedure is shown in Figure 6, while the specification and body of the 


hammings_problem package remain unchanged. 


separate (hammings_ problem) 
procedure solver (n : positive; 
primes : i positive_array; 
outputs : in out stream) is 
task type stream_processor is 
entry install_prime (a_prime : in positive); 
end; 
task min_finder is 
entry bid (a_head : in positive); 
entry wait_for_min; 
end; 
infinity : constant positive :=n + 1; 
pipes : array (primes'range) of stream_processor; 
i : positive := 1; 
task body stream_processor is sepaxate; 
task body min_finder is separate; 
begin 
for k in primes'range loop 


pipes (k).install_prime (primes (k) ); 


end loop; 
end; 


Figure 6 - Proper body of the solver procedure (second solution) 


A separate task is now devoted to the processing of each stream. The 
pipes array of streams has been replaced by an array (of the same 
name) whose components are of a task type called stream_processor; 
each task in this array declares and processes a single stream, using 
the corresponding prime. There is one additional task, min_finder, 
devoted to finding the minimum of the heads of all the streams at the 
appropriate time. Places for the bodies of the stream_processor task 
type and min_finder single task are held by body stubs; their proper 
bodies are made subunits of solver and are shown later. 


The pipes array of stream_processor tasks is created as part of 
the execution of solver (specifically, during the elaboration of its 
declarative part). The statement list of solver initializes each task 


30 K. W. Dritz 


with its corresponding prime, then waits (at its end) for all the tasks 
to terminate before returning to its caller. 


Each time through their loop, the stream_processor tasks bid their 
respective stream heads as the prospective minimum to min_finder 
and then wait for min_finder to place the minimum of the bids in i 
(the variable i plays the same role as before but is now global to all the 
tasks). The code implementing the behavior of a stream processor 
task is shown in Figure 7. 


separate (hammings_problem.solver) 
task body stream_processor is 
my prime : positive; 
my stream : stream; 
begin 
accept install_prime (a_prime : in positive) 
my prime := a_prime; 
end; 
while i <= n loop 


append (i*my prime, to => my_stream); 
min_finder.bid (head(my_stream)); 


min_finder.wait_for min; 
if head(my_stream) = i then 
behead (my_stream) ; 
end if; 
end loop; 
end; 


Figure 7 - Proper body of the stream_processor task type 


The pair of rendezvous used for communication and synchroniza- 
tion with min finder acts like a “barrier,” preventing the 
stream processor tasks from moving on to the beheading of their 
streams until all of them have communicated the value of their stream 
heads to min_finder and min_finder has found the minimum of them. 
The code implementing the behavior of the min_finder task is shown 
in Figure 8. 


separate (hammings_problem.solver) 
task body min finder is 
new_i : positive; 
begin 
while i <= n loop 
append (i, to => outputs); 
new_i := infinity; 
for k in primes'range loop 
accept bid (a_head : in positive) do 


new_i := min(new_i, a_head); 


end; 
end loop; 
:= new_i; 
for k in primes'range loop 
accept wait_for min; 
end loop; 
end loop; 


end; 


Figure 8 - Proper body of the min finder task 


The second solution cannot seriously be expected to yield any 
significant speedup, since the grains of parallel work between the syn- 
chronization points are too small in relation to the synchronization 
overhead. The serial bottleneck represented by the barrier has an 
analog in the first solution and does not, by itself, make matters worse. 
The main value of the second solution is to illustrate how tasking can 
be used to replace an unnecessary, deterministic ordering by a com- 
pletely nondeterministic ordering satisfying the relevant dataflow 
properties of the algorithm and nothing else.? (The for loops in 
min finder are used only to count the number of rendezvous to be 
performed, not to order them; the loop variable is not referenced in- 
side the loops.) 


32 K. W. Dritz 


2.3. Third Solution: Novel Use of Concurrency 


In an idealized implementation of Ada in which a rendezvous incurs 
no overhead, the speedup yielded by the second solution would be ex- 
pected to scale up with the number of primes; if there are more pro- 
cessors than primes, the surplus processors would not be exploited.® 
The radically different approach to be described next is capable of 
being scaled up in proportion to an independent parameter; it can 
therefore always be made to use all available processors. (We save for 
the fourth solution the details of the actual scaling up and present 
here the basic algorithm.) 


Our third solution uses a single recirculating stream of future out- 
put values; the stream is recirculating because some of the values that 
are removed from its head are put back on the stream at its tail. In 
each iteration of the main loop, the value at the head of the stream is 
removed and placed in the outputs stream; it is also used to trigger 
the generation, by another task (called products_generator), of a se- 
quence of products of itself and all the primes. Meanwhile—that is, as 
the products are being generated—the remainder of the recirculating 
stream of future output values is read and merged with the stream of 
products as they become available (both streams are in ascending 
order), with the resulting sequence of values being directed back to 
the recirculating stream (i.e., appended to its tail). 


In this solution, in contrast to the earlier ones, we do not generate 
any products greater than n. It is convenient to use the constant in- 
finity, defined as n+1 and previously used for a different purpose, as 
an end-marker both for the stream of products and for the recirculat- 
ing stream of future outputs (where it separates the last value pro- 
duced by one iteration from the first value produced by the next). The 
recirculating stream is initialized to a 1 followed by an end-marker; it 
grows and eventually shrinks in length with succeeding iterations, and 
the algorithm terminates when the recirculating stream is reduced to 
just the end-marker (i.e., when the first value read from the recircu- 


Append to 
output stream 


Recirculating stream of 
future outputs (followed 
by an end-marker) 


First value 
following an end-marker 


PRODUCTS_GENERATOR task. 
Generate a stream of products of 
the current output value and the 

primes (followed by an end-marker) 


Remaining values 
through next end-marker 


Stream of PRODUCTS 
{followed by an end-marker) 


Preserve end-markers 


Figure 9 - Dataflow in the novel algorithm for solving 
Hamming’s problem 


lating stream in an iteration of the main loop is the end-marker). The 
dataflow characteristics of this algorithm are depicted in Figure 9. 


The products _generator task proceeds at its own pace and could 
either generate products well before they are needed (in which case 
they remain in the products stream until needed) or generate them so 
slowly that the products stream would be empty when a product is 
needed from it. Therefore the stream abstractions previously assumed 
are inadequate for this application; the fact that the products stream 
is written and read by different tasks should be a clue to that inade- 
quacy. We now require that a task trying to read a stream that hap- 
pens to be empty be suspended until some other task writes to that 
stream. Streams of this type will be called “buffers” to distinguish 
them from the earlier kind. Indeed, we now assume the availability, as 
a library unit, of a generic package named buffers that provides an ab- 
stract data type called buffer complete with appropriate operations 
on objects of that type. Since we need buffers of positive integers, we 
instantiate the generic package buffers with the subtype positive to 
obtain a package that we call positive buffers; its buffer type is ex- 
actly what we seek. As we did with streams, we assume that the initial 


34 K. W. Dritz 


state of an object of type buffer is “empty,” allowing us to dispense 
with explicit initialization of buffers. For brevity, we omit the body and 
even the specification of the generic package. We remark, however, 
that it is convenient to combine the previous head and behead opera- 
tions into a single procedure called read, which waits for the given 
buffer to become nonempty, then removes and delivers the item at its 
head. We call the buffer analog of append, to keep our abstractions 
separate, write. Finally, we note that the implementation of buffers 
will now surely use tasks to achieve the desired suspension of the 
caller of read when required; indeed, the type buffer can be imple- 
mented as a task type whose behavior is that of a monitor. 


In the third solution, we retain the use of streams only for the out- 
puts of solver and use buffers elsewhere. In theory, the recirculating 
“stream” could indeed be a stream instead of a buffer, since it is read 
and written by the same task; we have chosen to realize it as a buffer, 
however, in anticipation of the fourth solution {in which the analogous 
object must be a buffer), so as to avoid accentuating nonessential dif- 
ferences between the current solution and that variation. As in the 
second solution, the specification of the hammings problem package 
remains unchanged, but this time we make a slight change in its body: 
for purely stylistic reasons, that is where we choose to instantiate the 
buffers generic package. The revised body of the hammings_ problem 
package is shown in Figure 10. 


The revised proper body of the solver procedure is shown in 
Figure 11. The products generator task is created and subsequently 
terminates, when it reaches the end of its execution, each time 
through the main loop. With a slight redesign, we could arrange to 
create it just once, enclose its processing within a loop, synchronize 
that loop with the main loop in the solver procedure, and make it 
terminate at the appropriate time. That redesign would require the 
use of a rendezvous for the synchronization, and distributed termina- 
tion for the products generator task. But since simplicity is more 


Ada 35 


desirable than efficiency in these solutions, we have chosen the 
method having conceptually simpler semantics. 


with min_and max, buffers; 

package body hammings_ problem is 
package positive _min_and_max is new min_and_max (positive); 
package positive buffers is new buffers (positive); 
use positive_min_and_max, positive _buffers; 


procedure solver (n : in positive; 


primes : in positive array; 


outputs : in out stream) is separate; 


Figure 10 - Body of the hammings_problem package (third solution) 


2.4. Fourth Solution: Dataflow Loop Unrolling 


The technique of “loop unrolling” can be applied in a unique way to 
the dataflow loop in Figure 9 by considering all the processing of val- 
ues between the moment they are removed from the recirculating 
stream and the moment they are put back in it to be a single “stage” 
of processing; by encapsulating a stage of processing in a task; by 
replicating that task any number of times; and by interposing buffers 
(instead of streams) between the tasks—that is, between the stages. By 
spreading the processing over a larger number of tasks, the result, 
whose dataflow characteristics are shown in Figure 12, achieves po- 
tentially greater parallelism—principally because several streams of 
products can be generated and merged with the recirculating stream 
of future outputs concurrently. It should be noted that the “recirculat- 
ing stream” of future output values is now spread out over the whole 
dataflow loop, with some values in each intertask buffer and some in 
local variables of each task, and with only one end-marker located 
somewhere around the loop. Although many tasks now append to the 
outputs stream, only one at a time does so as the “front” of the 
stream of future outputs advances from one stage to the next, and thus 
no synchronization problems arise from this behavior. 


36 K. W. Dritz 


separate (hammings_ problem) 
procedure solver (n : in positive; 
primes : in positive _array; 
outputs : in out stream) is 
infinity : constant positive :=n + 1; 
recire : buffer; 
i : positive; 


to => recirc); 
write (infinity, to => recirc); 
loop 
read (i, from => recirc); 
exit when i = infinity; 
append (i, to => outputs); 
declare 
ji, P : positive; 
products : buffer; 
task products generator; 
task body products_generator is 
begin 
for k in primes'range loop 
exit when i > n/primes(k); 
write (i*primes(k), to => products); 
end loop; 
write (infinity, to => products); 
end; 
begin 
read (j, from => recirc); 
read (p, from => products); 
loop 
if j < p then 
write (j, to => recirc); 
read (4, from => recirc); 
elsif j > p then 
write (p, to => recirc); 
read (p, from => products); 
else 
write (j, to => recirc); 
exit when j = infinity; 
read (j, from => recirc); 
read (p, from => products); 
end if; 
end loop; 
end; 
end loop; 
end; 


Figure 11 - Proper body of the solver procedure (third solution) 


Stream of OUTPUTS 
STAGE j a, ; STAGE STAGE 


To output stream 


First value STAGE task 
following an end-marker 


PRODUCTS_GENERATOR task. 


Generate a stream of products of 
the current output value and the 
primes (followed by an end-marker) 
Merge 
streams 
Remaining values through next end-marker 


Figure 12 - Dataflow in the novel algorithm for solving 
Hamming’s problem with loop unrolling 


The code for the fourth solution, the variant of the novel algorithm 
with dataflow loop unrolling, is split between two figures. Figure 13 
shows the revised proper body of the solver procedure, in which the 
body of the stage task type is represented by a body stub. Figure 14 
shows the proper body of stage. 


We have given solver an extra parameter, m, which is the depth of 
the loop unrolling (number of stages); to save space, we omit showing 
the trivial change in the specification and body of the hammings_pro- 
blem package necessitated by this addition. Ideally, the value of m 
should be increased in proportion to the number of processors avail- 
able, at a rate that is probably best determined empirically (it depends 
on how much of a processor's power is consumed by the tasks of each 
stage, among other things). 


38 K. W. Dritz 


separate (hammings problem) 


procedure solver (n, m : in positive; 


primes : in positive_array; 


outputs : in out stream) is 
task type stage is 
entry identify self (k : in positive); 
end; 
infinity : constant positive := n + 1; 
stages : array (1 .. m) of stage; 
sbuffers : array (1 .. m) of buffer; 
done : array (1 ..m) of boolean := (others => false); 
task body stage is separate; 
begin 
write (1, to => sbuffers(1)); 
write (infinity, to => sbuffers(1)); 
for k in stages'range loop 
stages (k).identify self (k); 
end loop; 
end; 


Figure 13 - Proper body of the solver procedure (fourth solution) 


In this solution, most of the work is done by the dependent tasks of 
solver. After “priming” the first stage and then identifying to each 
stage its position in the dataflow loop, solver waits (at its end) for its 
dependent tasks to terminate before returning to its caller; when it fi- 
nally does so, the outputs stream will have been completed. 


As shown by Figure 14, each stage’s first action is to receive its po- 
sition in the dataflow loop from solver. It then computes the position 
of its successor. These positions are, of course, indices in the arrays 
declared by solver, so that the successor of the last component in 
each of these arrays is the first component. Each stage uses its own 
index and that of its successor to determine which intertask buffer is 
its input buffer and which is its output buffer. 


separate (hammings_problem.solver) 
task body stage is 
self, successor : positive; 


accept identify self (k : in positive) do 
self := k; 
end; 
successor := self mod m + 1; 
loop 
read (i, from => sbuffer(self)); 
exit when i = infinity; 
append (i, to => outputs); 
declare 
3, P : positive; 
products : buffer; 
task products_generator; 
task body products generator is 
begin 
for k in primes'range loop 
exit when i > n/primes(k); 
write (i*primes(k), to => products); 
end loop; 
write (infinity, to => products); 
end; 
begin 
read (j, from => sbuffer(self)); 
read (p, from => products); 
loop 
if j < p then 
write (3, to => sbuffer (successor) ); 
read (j, from => sbuffer(self)); 
elsif j > p then 
write (p, to => sbuffer (successor) ); 
read (p, from => products) ; 
else 
write (j, to => sbuffer (successor) ); 
exit when j = infinity; 
read (j, from => sbuffer(self)); 
read (p, from => products); 
end if; 
end loop; 
end; 
end loop; 
done(self) := true; 
if not done(successor) then 
write (infinity, to => sbuffer (successor) ); 
end if; 
end; 


Figure 14 - Proper body of the stage task type 


40 K. W. Dritz 


When a stage reads the end-marker as the first item from its input 
buffer, in an execution of its main loop, it sets its own component of 
the done array to true, propagates the end-marker if required, and 
then terminates. Propagation of the end-marker from one stage to its 
successor is required if the successor has not yet read an end-marker 
as the first item from its input buffer, as indicated by the successor's 
component of the done array. Were the end-marker not propagated in 
this way, stages would deadlock waiting for input that would never ar- 
rive, and solver would never terminate. But, after the end-marker 
(not preceded by any future outputs) has made its way around the 
dataflow loop once, all of the stages will have terminated, and there 
will thus be no need to propagate it further. 


Care has been taken to ensure that the fourth solution will work for 
all values of m—in particular, even for the value 1. In that case, the 
successor of a stage is itself, and changes made by the stage to its 
“right-hand” environment are immediately seen as having an effect on 
its “left-hand” environment (and vice versa). The similarities between 
Figures 11 and 14 suggest that a further level of subprogram abstrac- 
tion would have been appropriate. 


3 The Paraffins Problem 


The chemical formula for paraffin molecules is C; Hau1. The prob- 
lem is as follows: Given an integer n, output—in order of increasing 
size—structural representations of all paraffin molecules for is n, in- 
cluding all isomers but no duplicates. 


The paraffins problem was designed to reveal the strengths of ap- 
plicative languages. Turner’s original solution [10] in the applicative 
language KRC makes extensive use of set abstraction and higher-order 
functions to produce a compact and elegant program at the cost of 
some inefficiency. His program was designed to produce all paraffin 
molecules of a given size by attaching paraffin radicals of appropriate 


Ada 41 


sizes to a leading carbon atom without regard, initially, to the fact that 
this simple process yields many different representations (different 
orientations) of each of the distinct paraffin isomers. Since the prob- 
lem calls for producing only one representation of each isomer, 
Turner’s solution filters out the duplicates. As each new paraffin 
molecule is generated, it is tested for distinctness from all previously 
retained molecules. The test for distinctness consists of checking 
each of the previously retained molecules for membership in the set of 
all structurally equivalent reorientations of the newly generated 
molecule. The latter set is obtained on the fly by computing the clo- 
sure of the set containing the newly generated molecule under various 
equivalence-preserving transformations (rotations, inversions, and 
swapping of the paraffin radicals attached to each of the molecule’s 
carbon atoms). For this process to be workable, implicit or explicit 
garbage collection must be at work. The KRC solution is concise in 
part because garbage collection is implicit and in part because no 
premium is placed on avoiding the recomputation of a previously com- 
puted closure set each time a new duplicate is found. The strategy 
adopted by Turner also was influenced by another feature of KRC—lazy 
evaluation—which at least partially reclaims some of the lost efficiency 
by deferring the computation of a set element until it is needed (a 
duplicate is typically found when its closure set is still incomplete). 


A programmer faced with solving this problem in a language not 
having the powerful and convenient features of KRC (or other applica- 
tive languages) is strongly motivated to forgo Turner’s solution—if 
programming explicit garbage collection is unappealing, and if strict 
evaluation and repeated computation are unacceptably inefficient—and 
instead search for a strategy that generates paraffin molecules so that 
duplicates are avoided from the outset. The Ada solution, like most of 
the others in this book, follows such a strategy.2 


A paraffin molecule can be regarded as a free tree (see [8], pp. 362- 
363) whose vertices correspond to the carbon atoms and whose edges 


42 K. W. Dritz 


correspond to the carbon-carbon bonds. The distinct paraffin isomers 
of size i—that is, having icarbon atoms—therefore correspond to the 
structurally distinct free trees having ivertices. Our program 
represents paraffin molecules as oriented, ordered trees (whose 
vertices we refer to here as nodes)—not as free trees—and the essence 
of our strategy for avoiding duplicates lies in the mapping between the 
vertices of the free-tree representation of a paraffin molecule and the 
nodes of its corresponding representation as an oriented, ordered 
tree in the program. 


It is apparent that the unordered neighbors of a vertex of the free- 
tree representation will become ordered subnodes of the node to 
which it is mapped in the representation used in the program. 
Varying the order of the subnodes results in different representations 
of the same paraffin molecule—that is, in structurally equivalent 
paraffin molecules. We avoid this source of duplicates by using a 
unique (lexicographic) ordering for the subnodes of a node. Note that 
whereas vertices of the free tree may have fewer than four neighbors 
(i.e, when they represent carbon atoms to which some hydrogen 
atoms are attached), the corresponding nodes in the tree in the pro- 
gram have either four subnodes (in the case of the root node) or three 
subnodes and an ancestor node (in the case of a node other than the 
root). Also note that some of the subnodes represent hydrogen atoms; 
in the program, the positions at which hydrogen atoms are attached to 
carbon atoms are explicit. 


It is also apparent that the unoriented edges of the free tree will 
become directed links between nodes in the tree in the program and 
that some vertex will be mapped to the root node. Varying the vertex 
that is mapped to the root node again results in different representa- 
tions of the same paraffin molecule. We avoid this source of duplicates 
by appealing to the centroid theorem for free trees (see [8], p. 387): a 
free tree of odd size has a single centroid (vertex of minimum height, 
where the height of a vertex is the size of its largest subtree), while 


Ada 43 


one of even size has either a single centroid or a pair of adjacent cen- 
troids. We canonicalize the representation of single-centroid paraffin 
molecules by selecting the centroid as the root node in the program 
tree. In the case of double-centroid paraffin molecules, we use as root 
node a node that corresponds not to any carbon atom but to the car- 
bon-carbon bond between the centroids. It has exactly two subnodes, 
corresponding to the two centroids; they, of course, are lexicographi- 
cally ordered as well. 


To summarize, our strategy avoids duplicates by canonically select- 
ing root nodes and lexicographically ordering subnodes. 


Odd-sized paraffin molecules and even-sized single-centroid paraf- 
fin molecules are here called “carbon-centered paraffins,” or CCPs; 
the root nodes of their trees correspond to carbon atoms and have 
four subnodes, each the root of a subtree representing a paraffin radi- 
cal. The four radicals of a CCP of size ieach have size less than or 
equal to the floor of (i- 1) / 2 and total i- 1 in size. Even-sized dou- 
ble-centroid paraffin molecules are here called “bond-centered paraf- 
fins,” or BCPs; the root nodes of their trees correspond to carbon-car- 
bon bonds and have two subnodes, each the root of a subtree repre- 
senting a paraffin radical. The two radicals of a BCP of size ieach have 
size exactly equal to i/ 2 and therefore total iin size. For their part, 
the root nodes of paraffin radicals of size i, for i> 0, correspond to 
carbon atoms and have three subnodes, each the root of a subtree rep- 
resenting a paraffin radical. (Thus, the data structure for paraffin radi- 
cals is recursive.) The three subradicals of such a paraffin radical each 
have size less than or equal to i- 1 and total i- 1 in size. A paraffin 
radical of size O is just a hydrogen radical; its root node corresponds 
to a hydrogen atom and has no subnodes. In the program, all three 
kinds of objects are constructed by the same process, which amounts 
to attaching some number—two, three, or four—of paraffin radicals of 
appropriate sizes to some other node. Restrictions on the maximum 
sizes of the attached radicals guarantee that the node at the root of a 


44 K. W. Dritz 


CCP corresponds to the centroid, and similarly that the pair of nodes 
immediately descendent from the root of a BCP always correspond to 
the adjacent centroids; this property, together with the lexicographic 
ordering of the attached radicals, guarantees the avoidance of dupli- 
cates. 


3.1. Serial Solution 


For the sake of simplicity, we develop in this section a completely 
serial solution to the paraffins problem; in the next section, we discuss 
the opportunities for parallelism presented by the problem and obtain 
one parallel solution by a straightforward modification of a small part 
of the serial solution. 


At the top level, our serial solution is in the form of a function, 
paraffins (shown in Figure 15), that takes a positive integer n, and re- 
turns an array, indexed by the values 1 to n, whose ith component is a 
list of the unique paraffin isomers of size i. The function is compiled 
in the context of some application-dependent types for radicals and 
molecules that are defined in the two library packages (radicals and 
molecules) named in its context clauses. Certainly the function needs 
some such types; presumably the larger application of which the func- 
tion is a part does, too. (The specifications and bodies of the packages 
of types are shown in Figures 16 through 19.) The paraffins function 
first sequentially creates, and internally stores in the components of 
the local array r_array, lists of paraffin radicals of sizes 0 to n/2 
(integer division in Ada truncates towards zero, which for positive 
operands is equivalent to delivering the floor of the quotient); n/2 is 
the size of the largest radical that will be needed for a paraffin 
molecule of size n. The radicals are created in order of size because 
smaller radicals are needed for the construction of larger ones.!° 
Next, the function sequentially creates lists of paraffin molecules of 
sizes 1 to n, storing in the i-th component of m_array the list for size 


Ada 45 


i—first the CCPs, and then (for even sizes) the BCPs. Finally, it returns 
m_array. (The contents of the function are further discussed later.) 


with radicals, molecules; 

use radicals, molecules; 

function paraffins (n : positive) return array of _molecule lists is 
type array of naturals is array (positive range <>) of natural; 
use radical_lists, molecule lists; 
x_array : array of radical_lists (0 .. n/2); 
m_array : array of molecule lists (1 .. n); 
generic 

with procedure apply to_each (tuple : in array of_radicals); 

procedure enum_rad tuples (p : in array of_naturals); 
procedure enum_rad tuples (p : in array of naturals) is separate; 
procedure generate _radicals_of size (i : in positive) is separate; 


procedure generate bcps_of size (i : in positive) is separate; 


begin 
append (hydrogen_radical, to => r_array(0)); 
for iin 1 .. n/2 loop 
generate radicals of_size (i); 
end loop; 
for iin 1 ..n loop 
generate _ccps_of size (i); 
if i mod 2 = 0 then 
generate bcps_of size (i); 
end if; 
end loop; 


return m_array; 


end; 


Figure 15 - Self-specifying body of the paraffins function 
(serial version) 


Let’s look at the packages that declare the problem-dependent 
types used in the solution. The specification of the package radicals 
is shown in Figure 16. Radicals are trees whose nodes are of the type 
radical_node, which is a record type with a discriminant indicating 
whether the node represents a hydrogen radical (which has no sub- 
nodes) or a carboniferous radical (which has three). The type radical 


46 K. W. Dritz 


is an access type whose designated type is radical_node; thus a radi- 
cal is represented by the pointer to its root node, which if not degen- 
erate contains pointers to its subnodes, etc. An unconstrained array 
type providing for arrays of radicals is also declared, as are several 
constrained subtypes thereof with fixed numbers of components; one 


with lists; 
package radicals is 

type radical_kind is (hydrogen, carboniferous) ; 
type radical_node (kind : radical_kind); 
type radical is access radical node; 
type array _of radicals is array (positive range <>) of radical; 
subtype two_radicals is array of radicals (1 .. 2); 
subtype three radicals is array of radicals (1 .. 3); 
subtype four_radicals is array of_radicals (1 .. 4); 
type radical _node (kind : radical_kind) is record 

case kind is 

when hydrogen => 

null; 
when carboniferous => 
carbon_neighbors : three_radicals; 

end case; 
end record; 
package radical_lists is new lists (radical); 
subtype radical_ list is radical_lists.list; 
type array of _radical_lists is 

array (natural range <>) of radical_list; 


function hydrogen_radical return radical; 


function radical_made_from (subradicals : three_radicals) 


return radical; 
end; 


Figure 16 - Specification of the radicals package 


of these is used in the declaration of radical_node. Linked lists of 
radicals are obtained by instantiating with the type radical a generic 
package called lists which, because of its utility in diverse applica- 
tions, we assume to be available as an application-independent library 
unit; the specification of radicals is compiled in its context. We do 


Ada 47 


not show its definition here, and we merely remark that we assume it 
exports at least a type called list; head, tail, and is_empty functions; 
and an append procedure. We furthermore assume that lists are ini- 
tially empty. Another unconstrained array type providing for arrays of 
linked lists of radicals is declared, and finally two functions are de- 
clared—one for allocating, and returning a pointer to, an object of type 
radical_node representing a hydrogen radical, and one for allocating, 
and returning a pointer to, an object of type radical_node rep- 
resenting a carboniferous paraffin radical made from pointers to the 
objects of type radical_node that are its subnodes. 


The body of the radicals package is shown in Figure 17; it con- 
tains the bodies of the radical-constructing functions declared in its 
specification. These are simple enough that they should need no ex- 
planation. 


package body radicals is 
function hydrogen_radical return radical is 
begin 
return new radical node' (kind => hydrogen); 
end; 


function radical_made_from (subradicals : three_radicals) 


return radical is 
begin 
return new radical _node' (kind => carboniferous, 
carbon_neighbors => subradicals); 
end; 


end; 


Figure 17 - Body of the radicals package 


The specification of the molecules package is shown in Figure 18. 
Molecules are trees whose root nodes are of the type molecule_node, 
which is a record type with a discriminant indicating whether the 
node represents a BCP (which has two subnodes) or a CCP (which has 
four). The type molecule is an access type whose designated type is 
molecule node; thus a molecule is represented by the pointer to its 


48 K. W. Dritz 


root node, which contains pointers to its subnodes, etc. Note that the 
types molecule_node and molecule, unlike the types radical_node 
and radical, are not recursive types. That is, the subnodes of a 
molecule are not molecules but radicals. To achieve visibility to the 
types associated with radicals, we compile the specification of 
molecules in the context of radicals (that is, it “withs” radicals). 
Linked lists of molecules are obtained by instantiating with the type 
molecule the previously discussed generic package lists, which is 
therefore also named in the context clause of the specification of 
molecules. An unconstrained array type providing for arrays of linked 
lists of molecules is declared, and finally we declare a function for al- 
locating, and returning a pointer to, an object of type molecule_node, 
made from pointers to the objects of type radical_node that are its 
subnodes. 


with radicals, lists; 
use radicals; 
package molecules is 
type molecule kind is (bond_centered, carbon_centered) ; 
type molecule node (kind : molecule_kind) is record 
case kind is 
when bond centered => 
bond_neighbors : two_radicals; 
when carbon_centered => 
carbon neighbors : four_radicals; 
end case; 
end record; 
type molecule is access molecule_node; 
package molecule lists is new lists (molecule); 


subtype molecule_list is molecule_lists.list; 


type array of molecule_lists is 
array (positive range <>) of molecule list; 

function molecule_made_from (radicals : array_of_radicals) 
return molecule; 


end; 


Figure 18 - Specification of the molecules package 


Ada 49 


The body of the molecules package is shown in Figure 19; it con- 
tains the body of the molecule-constructing function declared in its 
specification. It is simple enough that it should need no explanation. 


package body molecules is 
function molecule_made_from (radicals : array_of_radicals) 


return molecule is 
begin 
if radicals'length = 2 then 
return new molecule node' (kind => bond_ centered, 
bond_neighbors => radicals) ; 
else -- Must be 4. 
return new molecule node' (kind => carbon_centered, 
carbon_neighbors => radicals); 
end if; 
end; 


end; 


Figure 19 - Body of the molecules package 


The reader should return to the body of paraffins (Figure 15) for a 
moment. The function of the use clause in the context clause pre- 
ceding the paraffins function is to provide direct visibility to the 
names declared in both radicals and molecules. These names in- 
clude radical lists and molecule _ lists, the packages obtained in 
radicals and molecules by the instantiation therein of lists. 
However, the visible names do not include those of the operations on 
lists (head, tail, append, is_empty), since those operations are not 
declared in radicals or molecules. It is the function of the second 
use clause, the one in the declarative part of the paraffins function, 
to provide direct visibility to them. Note that the same set of subpro- 
gram names is exported by both radical_lists and molecule lists. 
How does the compiler know from which of these packages the name 
append, appearing later in paraffins, comes? The answer is that the 
overloaded names of the two append procedures are disambiguated by 
overload resolution, based on the types of the actual parameters in a 
call; the same is true of calls to head, tail, is empty, and other calls 


50 K. W. Dritz 


to append in other parts of the solution. No such overload resolution 
occurs, however, for the type name list exported by both radi- 
cal_lists and molecule_lists; in fact, the two occurrences of the 
identical type name cancel each other out, and neither is directly vis- 
ible by the simple name list in paraffins or in subunits thereof. 
That is why we declared the subtype radical_list in the specification 
of radicals as being synonymous with the type list exported by 
radical lists, and the subtype molecule list in the specification of 
molecules as being synonymous with the type list exported by 
molecule_lists. So, in paraffins and its subunits, we use the name 
radical list or molecule _list as appropriate, instead of the ambigu- 
ous (and not even directly visible) name list.1 


We turn now to the real work of constructing paraffin radicals, 
BCPs, and CCPs. 


Radicals of size i, for i>0O, are generated by first enumerating all 
ordered, nondecreasing, ternary partitions of i- 1. The partitions are 
enumerated in a natural order. (For example, the ordered, nonde- 
creasing, ternary partitions of 7 are enumerated in the order (0,0,7), 
(0,1,6), (0,2,5), (0,3,4), (1,1,5), (1,2,4), (1,3,3), (2,2,3).) Each such 
partition gives the sizes of the three subradicals that need to be at- 
tached (in order) to a carbon atom to obtain a radical of size & since 
the maximum size of any of these subradicals is i- 1, they are guaran- 
teed to have been created earlier (remember that the radicals are cre- 
ated in order of size). As each such partition is enumerated, we sub- 
ordinately enumerate all lexicographically ordered triples of radicals 
having the indicated sizes. Each such triple gives the actual subradi- 
cals that need to be attached (in order) to a carbon atom to obtain a 
radical of size i. This overall process generates the radicals of size iin 
lexicographic order. 


CCPs of size iare generated by using the same process, with differ- 
ent parameters. First we enumerate all ordered, nondecreasing, qua- 
ternary partitions of i- 1 having elements of size not greater than the 


Ada 51 


floor of (i- 1) / 2. Each such partition gives the sizes of the four radi- 
cals that need to be attached (in order) to a carbon atom to obtain a 
CCP of size i; the radicals are guaranteed to have been created earlier 
(remember that the radicals are created before the molecules). As 
each such partition is enumerated, we subordinately enumerate all 
lexicographically ordered quads of radicals having the indicated sizes. 
Each such quad gives the actual radicals that need to be attached (in 
order) to a carbon atom to obtain a CCP of size i. 


In theory, we do not need to enumerate partitions to obtain the 
sizes of the constituent radicals of BCPs of size i, since the only parti- 
tion we need is G, 4), and it can be obtained simply by constructing its 


sole element. However, we can reuse the software components used 
for generating radicals and CCPs by supplying them with parameters 
that will enumerate all ordered, nondecreasing, binary partitions of i 
having elements of both minimum and maximum size 5; of course, 


there will be only one. This partition gives the sizes of the two radi- 
cals that need to be attached (in order) to a carbon-carbon bond to ob- 
tain a BCP of size i Subordinately, we enumerate all lexicographically 
ordered pairs of radicals having the indicated sizes. Each such pair 
gives the actual radicals that need to be attached {in order) to a car- 
bon-carbon bond to obtain a BCP of size i. 


From the preceding discussion, it should be clear that a central 
component of our solution is a procedure that enumerates all the 
ordered, nondecreasing partitions of a given positive integer into a 
given number of elements each bounded by a given minimum and 
maximum. This suggests a procedure with four formal parameters. 
Actually, a fifth parameter is needed: the name of the procedure en- 
capsulating the operation to be applied to each partition as it is enu- 
merated (that operation is the one that enumerates the tuples of radi- 
cals of the sizes given by the partition). Since parameterization by a 
procedure can currently be accomplished in Ada only with the use of 
generic units, we make our partition-enumerating procedure into a 


52 K. W. Dritz 


generic procedure. It is called enum_partitions, and its specification 
is shown in Figure 20. Since the declaration of the generic formal 
subprogram parameter, named apply _to_each, must mention a type— 
that of the array used to pass an enumerated partition to the actual 
subprogram associated with apply _to_each—we make that type an- 
other generic formal parameter, named array type. In so doing, we 
import into enum_partitions everything it needs to be instantiated for 
an application; it gets nothing else from a global scope or a context 
clause and can therefore be an application-independent library unit. 
So, the generic procedure enum_partitions has two generic formal 
parameters, and the ordinary procedure that is obtained by 
instantiating it has four ordinary formal parameters. The body of 
enum_partitions will be shown later. 


generic 
type array type is array (positive range <>) of natural; 
with procedure apply to_each (p : in array type); 


procedure enum_partitions (sum_of_elements : in natural; 


nbr_of_ elements : in positive; 
min_element : in natural; 


max element : in natural); 


Figure 20 - Specification of the enum_partitions generic procedure 


We can anticipate three instantiations of enum_partitions—for 
generating radicals, BPCs, and CCPs. What procedures will be associ- 
ated with apply to_each in those three instantiations, and what types 
with array type? The second part of the question will be answered 
later, when we actually look at the instantiations. The three proce- 
dures we need for the generic actual subprogram parameters all have 
much in common: they take an array specifying a nondecreasing se- 
quence of radical sizes and must enumerate all tuples of lexicographi- 
cally ordered radicals of the corresponding sizes. This suggests a pro- 
cedure with one formal parameter. However, another parameter is 
needed: the name of a procedure encapsulating the operation to be 
applied to each tuple as it is enumerated (that operation is the one 


Ada 53 


that actually constructs a radical, BCP, or CCP from the tuple of radi- 
cals and appends it to the appropriate list). Again, we require the use 
of generic units. Thus, the three tuple-enumerating procedures will 
be obtained by instantiating a second generic procedure, which we call 
enum_rad_ tuples. Unlike enum_partitions, enum_rad_tuples is very 
much application specific; it needs to access the list of radicals of a 
given size i—that is, r_array(i). The simplest way to give it that vis- 
ibility is to nest it within paraffins (see Figure 15). Its body, a sub- 
unit of paraffins, is shown later. 


We can now turn to the three procedures that paraffins calls: gen- 
erate radicals of size, generate _ccps of size, and generate_ 
beps_of_size. Their specifications were in paraffins, and their 
bodies were occupied there by body stubs. The proper body of gener- 
ate_radicals_of_size is shown in Figure 21. 


with enum_partitions; 
separate (paraffins) 
procedure generate_radicals_of_size (i : in positive) is 
procedure make_and_append_rad (triple : in three_radicals) is 
begin 
append (radical_made_from(triple), to => r_array({i)); 
end; 
procedure enum_subrads for rads is 
new enum_rad_tuples (apply_to_each => make_and_append_rad); 
procedure enum_partitions for_rads is 
new enum_partitions (array type => array of naturals, 
apply to_each => enum_subrads_for_rads); 
begin 


enum_partitions_for_ rads (sum_of_elements 


nbr_of elements 
min_element 


max_element 


Figure 21 - Proper body of the generate radicals of size procedure 


We see there 


54 K. W. Dritz 


e the self-specifying body of a local procedure, make _and_ 
append_rad, that takes an array of three radicals, makes 
a new radical with them as subradicals, and appends 
the new radical to r_array(i), where i is the formal 
parameter of generate_radicals of size; 


¢ an instantiation of the generic procedure enum_rad_ 
tuples (visible because it is declared in the parent unit, 
paraffins) with make_and_append_rad as generic actual 
parameter, to yield the ordinary procedure enum_sub- 
rads_for_rads; 


* an instantiation of the generic procedure enum_parti- 
tions (visible because of the context clause) with 
enum_subrads_for_rads and array of_naturals!? as 
generic actual parameters, to yield the ordinary pro- 
cedure enum_partitions for rads; and 


e the call of enum_partitions for rads with appropriate 
expressions involving i as actual parameters. 


Note that there is only one executable statement in generate_rad- 


icals of size. 


The proper body of generate_ccps_of_size is shown in Figure 22. 
Its contents are analogous to those of generate_radicals_of_size, 
differing only in the procedure make_and_append_ccp at the head of 
the chain of instantiations and in the expressions involving i in the 
call of the procedure at the tail of the chain of instantiations. 


The analogous proper body of generate_bcps_of_size is shown in 
Figure 23. Incidentally, the three procedures just discussed are so 
similar in structure that they could all be obtained by generic 
instantiation, with appropriate generic actual parameters, of one 
generic procedure. Little would be gained, however, so we do not 
pursue that here. 


with enum_partitions; 
separate (paraffins) 
procedure generate _ccps of size (i : in positive) is 
procedure make_and_append_ccp (quad : in four_radicals) is 
begin 
append (molecule_made_from(quad), to => m_array(i)); 
end; 
procedure enum_rads_ for _ccps is 


new enum_rad_tuples (apply _to_each => make_and_append_ccp); 


procedure enum_partitions_for_ccps is 
new enum_partitions (array type => array of naturals, 
apply to_each => enum_rads_for_ccps); 
begin 
enum_partitions_for_ccps (sum_of_elements => i-1, 
nbr_of_elements => 4, 
min_element => 0, 


max_element => (1-1)/2); 


Figure 22 - Proper body of the generate _ccps of size procedure 


with enum_partitions; 
separate (paraffins) 
procedure generate_bcps_of_size (i : in positive) is 
procedure make_and_append_bcp (pair : in two_radicals) is 
begin 
append (molecule_made_from(pair), to => m_array(i)); 
end; 
procedure enum_rads for_bcps is 
new enum_rad tuples (apply to_each => make_and_append_ bcp); 
procedure enum partitions for_bcps is 
new enum partitions (array type => array of_naturals, 
apply to_each => enum_rads_for_bcps); 
begin 


enum_partitions for_beps (sum_of_elements 


nbr_of_elements 
min_element 


max_element 


Figure 23 - Proper body of the generate _bcps_of_size procedure 


56 K. W. Dritz 


The total number of executable statements we have shown is still 
surprisingly small. The bulk of the executable statements in our solu- 
tion is to be found in the bodies of the generic procedures 
enum_partitions and enum_rad_tuples, which are all the pieces of our 
solution that remain to be presented. The body of enum_partitions, 
which (as will be recalled) was designed as an application-indepen- 
dent library unit, is shown in Figure 24. The code in enum_partitions 


with min_and_max; 
procedure enum_partitions (sum_of_elements : in natural; 
nbr_of_elements : in positive; 
min_element : dn natural; 
max_element : dn natural) is 
package integer_min_and_max is new min_and_max (integer); 
use integer min_and_max; 
p : array type (1 .. nbr_of_elements); 
procedure recursively partition (level : in positive; 
remainder : in natural; 
prev_element : in natural) is 
remaining levels : constant natural := nbr_of_elements ~- level; 
begin 
if remaining_levels > 0 then 
for element in 


max(remainder ~ max_element*remaining levels, prev_element).. 


min(remainder - min_element*xremaining levels, 
remainder / (remaining_levels + 1)) 
loop 
p(level) := element; 
recursively partition (level + 1, remainder - element, element); 
end loop; 
else 
p(level) := remainder; 
apply _to_each (p); 
end if; 
end; 
begin 
recursively partition (1, sum_of_elements, min_element); 


end; 


Figure 24 - Body of the enum_partitions generic procedure 


Ada 57 


is tricky, but not profound. An inner procedure, recursively _parti- 
tion, is called to assign the first element of the partition; it calls itself 
recursively to assign successive elements, until the last element is 
reached. At each level of recursion, it computes!% the range—possibly 
empty—of element values acceptable at that level, and it loops!4 
through those values (ascending to the next higher level for each in 
turn). At the highest level, when a partition has been enumerated, it 
applies the operation passed parametrically to the partition. 


The proper body of the remaining generic procedure, enum_rad_ 
tuples (a subunit of paraffins), is shown in Figure 25. It has a struc- 
ture similar to enum_partitions. An inner procedure, recursively _ 
enumerate, is called to assign the first element of the radical tuple; it 
calls itself recursively to assign successive elements, until the last 
element is reached. At each level of recursion, it determines the list 
of radicals available for assignments at that level, and it loops through 
those values (ascending to the next higher level for each in turn); that 
list is the entire list of radicals of the size needed for the level, unless 
the level has the same size as the previous level, in which case the list 
is the remaining portion of the list from the previous level, starting 
from the radical that was assigned at the previous level.15 At the 
highest level, when a tuple has been enumerated, it applies the 
operation passed parametrically to the tuple. 


Finally, it is worth emphasizing that the trees we construct to rep- 
resent radicals and molecules share components. Thus, the subradi- 
cals of a radical do not occupy storage independently from their oc- 
currences as top-level radicals in their own right, or from other oc- 
currences of themselves as subradicals of a different radical. The cre- 
ation of a new radical does not require the copying of subradicals, nor 
does the creation of a molecule require the copying of radicals; they 
require only the allocation of a single new root node, which is initial- 
ized with pointers to the appropriate subnodes. Each radical is, in 


58 K. W. Dritz 


general, pointed to many times. Only one instance of the hydrogen 
radical is created. 


separate (paraffins) 
procedure enum_rad_ tuples (p : in array of_naturals) is 
radical _tuple : array of_radicals (1 .. p'length); 
procedure recursively enumerate (level : in positive; 
remainder : in radical_list; 
prev_element : in natural) is 
levels radical_list : radical_list; 
begin 
if p(level) = prev_element then 
levels _radical_list := remainder; 
else 
levels _radical_list := r_array(p(level)); 
end if; 
while not is_empty(levels_radical_list) loop 


radical tuple(level) := head(levels_radical_list); 
if level < p'length then 
recursively enumerate (level + 1, 
levels _radical_list, 
p(level)); 


else 
apply to_each (radical tuple) ; 
end if; 
levels radical_list := tail(levels radical_list); 
end loop; 
end; 
begin 
recursively enumerate (1, r_array(p(1))}, p(1)); 
end; 


Figure 25 - Proper body of the enum_rad_tuples generic procedure 


3.2. Parallel Solution 


There are many ways that one could introduce explicit parallelism, 
if one desired, to gain speedup. The most obvious and straightforward 
way is to perform all the iterations of the molecule-constructing loop 
of paraffins in parallel (say, each by a separate task), since the list of 
paraffin molecules of size i is completely independent of the list for 
any other size.16 However, it is not necessary to wait until all the radi- 
cal lists are complete before starting on the molecule lists. Since 
molecules of size i involve radicals of sizes not larger than the floor of 
(i- 1) / 2, it is possible to start the generation of molecules of size 2i 
(if i> 0) and those of size 2i+ 1 {if 2i< n) as soon as the radicals of 
size iare finished. A parallel version of the paraffins function capitaliz- 
ing on this observation is shown in Figure 26. The only additions to 
the declarative part of the function are the specification and body stub 
of a task type, worker_task, whose proper body is shown in the next 
figure. The block statement in the statement list of the function de- 
clares an array, molecule_list_generator, of tasks of the type 
worker _task, all of which wait for a rendezvous with their 
start_on_size entry before proceeding. The block statement then 
sequentially creates the radical lists and, as each is completed, signals 
one or two of the tasks to start on their molecule lists; it does not wait 
at that point for those tasks to complete. Each task continues without 
any further interaction until it completes. The block statement waits 
at its end until all of its dependent tasks have completed; then the re- 
turn statement is finally executed. 


The proper body of the worker_task task is shown in Figure 27. 
After receiving its signal to proceed, during which it records the size 
of the molecules on which it is to work, the task simply calls gener- 
ate_ccps_of size to generate the CCPs of its size. Following that, if 
its size is even, it calls generate _bcps of size to generate the BCPs 
of that size. Of course, the BCPs and CCPs of a given size are also in- 
dependent of each other and could be constructed in parallel. As it 


60 K. W. Dritz 


stands, our program is not so amenable to that, but one can imagine 
simple changes that would facilitate further enhancement. 


with radicals, molecules; 
use radicals, molecules; 
function paraffins (n : positive) return array of _molecule_lists is 
type array of naturals is array (positive range <>) of natural; 
use radical lists, molecule lists; 
x_array : array of_radical_lists (0 .. n/2); 
m_array : array_of_molecule_lists (1 .. n); 
generic 
with procedure apply to_each (tuple : in array _of_radicals); 
procedure enum_rad_ tuples (p : in array of_naturals); 
task type worker task is 
entry start_on_size (i : in positive); 
end; 
procedure enum_xad_tuples (p : in array of _naturals) is separate; 
procedure generate_radicals of _size (i : in positive) separate; 
procedure generate_ccps_of_size (i : in positive) separate; 
procedure generate bcps_ of size (i: in positive) separate; 
task body worker task separate; 
begin 
declare 
molecule _list_generator : array (1 .. n) of worker_task; 
begin 
append (hydrogen_radical, to => r_array(0)); 
molecule_list_generator(1).start_on_size (1); 
for iain 1 .. n/2 loop 


generate radicals of_size (i); 


molecule list_generator(2*i).start_on_size (2*i); 
if 2*i < n then 
molecule _list_generator (2*i+1) .start_on_size (2*i+t1); 
end if; 
end loop; 
end; 
return m_array; 


end; 


Figure 26 - Self-specifying body of the paraffins function 
(parallel version) 


Ada 61 


Applicative languages have an obvious advantage over Ada—namely, 
that opportunities for parallelism such as we have seized here (and 
others on a more microscopic scale) are exploited implicitly and au- 
tomatically to the extent permitted by the dataflow properties of the 
problem at hand. 


separate (paraffins) 
task body worker_task is 
my size : positive; 
begin 
accept start_on_size (i : in positive) do 
my size := i; 
end; 
generate _ccps_of_size (my_size); 
if my _size mod 2 = 0 then 


generate_bcps_of_size (my_size); 
end if; 
end; 


Figure 27 - Proper body of the worker task task 


4, The Doctor's Office Problem 


Given a set of patients, a set of doctors, and a receptionist, the 
problem is to model the following interactions. Initially, all patients 
are well, and all doctors are in a queue waiting to treat sick patients. 
At random times, patients become sick and enter a queue for treat- 
ment by one of the doctors. The receptionist handles the two queues, 
assigning patients to doctors in a FIFO manner. Once a doctor and 
patient are paired, the doctor diagnoses the illness and, in a randomly 
chosen period of time, cures the patient. Then, the doctor and pa- 
tient return to the receptionist’s desk, where the receptionist records 
pertinent information. The patient is then released until such time as 
he or she becomes sick again, and the doctor returns to the queue to 
await another patient. Any distribution functions may be used for the 
patients’ healthy times and doctors’ cure times, but the code that 


62 K. W. Dritz 


models doctors must have no knowledge of the distribution function 
for patients, and vice versa, and that for the receptionist should know 
nothing of either. 


The doctor’s office problem seems tailor-made for Ada; in fact, the 
solution is so concise that it is greatly eclipsed by our discussion of it. 
The solution employs a collection of tasks interacting in ways that 
model the interactions of the patients, doctors, and receptionist as 
defined in the problem statement; indeed, there is a receptionist task 
and one task for each patient and each doctor. No difficulties are en- 
countered in constructing the model in Ada; in particular, Ada allows 
the task modeling the receptionist to respond directly to interactions 
initiated by either patient or doctor tasks, with no regard to which oc- 
curs first. Some other languages have difficulty responding asyn- 
chronously to signals from a union of dissimilar sources. 


4.1. Characteristics of Alternative Approaches 


This problem can be solved in either of two distinctly different 
ways. One can write a “real-time program,” in which the periods of 
health and sickness are modeled by the suspension of patient (or, in 
the latter case, both patient and doctor) tasks for directly proportional 
periods of real time. Or, one can write a “discrete-event simulation 
program,” in which the time to the next scheduled “event” (such as 
the expiration of a healthy period) is elided. In the former, the inter- 
acting tasks spend most of their time waiting for the expiration of one 
delay or another; and since they will all typically be doing so simulta- 
neously, the program as a whole runs very inefficiently. The latter ap- 
proach is far more efficient, since whenever all the tasks would 
otherwise be waiting for the expiration of various delays, the clock is 
effectively reset instantaneously to the time at which the next delay is 
due to expire. The program as a whole is consequently never in the 
“wait state,” and its total duration is dominated not by the lengths of 
the simulated waits but rather by the number of simulated nonwaiting 


Ada 63 


events and the computational resources required to model each one of 
them. 


Both approaches have in common the need to suspend individual 
tasks for some period of time before they can proceed with their next 
action. (In the real-time program, tasks are suspended for a prede- 
termined amount of time that is not influenced by what other tasks do 
during the wait; in the discrete-event simulation, tasks are suspended 
for an amount of time not known in advance, the suspension ending at 
a moment determined by what all the tasks do until then). Thus, as 
far as the logical behavior of each task is concerned, there is very little 
difference between the two approaches: tasks interact with each other 
and occasionally become suspended until awakened by the expiration 
of a delay or by a rendezvous. Since the interactions and suspensions 
to be modeled are adequately and equally demonstrated by either ap- 
proach, we have chosen the simpler (albeit less efficient) approach in- 
volving a real-time program. The discrete-event simulation would 
have the added complexity of an ordered-time-queue manager, which 
is neither germane to the problem nor instrumental in determining 
the interactions among the other tasks. The real-time program can be 
changed to a discrete-event simulation merely by making systematic, 
local changes at all the places where a delay statement is found and 
adding an ordered-time-queue manager component; the details are 
omitted for lack of space. 


The problem statement does not say anything about when or how 
the modeling of the doctor's office is to end; presumably, such details 
only complicate the essential behavior to be demonstrated. In that 
spirit, we have simplified our solution by ignoring termination ques- 
tions; the modeled interactions continue forever. Termination can, of 
course, be designed in, after one defines appropriate termination cri- 
teria; depending on the criteria adopted, achieving termination might 
require additional communication among some of the tasks. 


64 K, W. Dritz 


4,2. Structure of the Solution 


The solution presented here is in the form of a generic procedure 
called doctors office. The generic procedure is parameterized by 
two function subprograms, which are random-number generators for 
selecting the random periods of health enjoyed by each patient and 
the intervening random periods of diagnosis and cure (once the pa- 
tient has been assigned a doctor). The specification for the generic 
procedure is shown in Figure 28. To use this machinery, one instanti- 
ates the generic procedure, supplying the names of the two random- 
number generators, to get a (nongeneric) procedure, which one then 
calls with two actual parameters representing the number of patients 
and number of doctors. 


generic 
with function random_healthy_ period return duration; 


with function random_treatment_period return duration; 
procedure doctors_office (number_of_patients : in natural; 


Figure 28 - Specification of the doctors_office generic procedure 


In the solution, we use the predefined fixed-point type duration 
for the result of the random-number generators, since that is the type 
required for the operand of a delay statement, where the result is 
used. We use the predefined subtype natural of the predefined type 
integer for the number of patients and number of doctors. The body 
of the generic procedure is shown in Figure 29. Places for the bodies 
of the task types patient_task and doctor_task and for the single 
task receptionist are held by body stubs, whose corresponding 
proper bodies are shown later. 


procedure doctors office (number_of_patients : in natural; 
number of doctors : in natural) is 
task type patient_task; 
type doctor _task; 
type doctor is access doctor _task; 
task type doctor task is 
entry identify self (myself : in doctor); 
entry patient_visiting for treatment; 
end; 
task receptionist is 
entry patient_becoming_sick; 
entry doctor requesting _patient (dl : in doctor); 
entry patient_requesting doctor (d2 : out doctor); 
end; 


patient : array (1 .. number _of_ patients) of patient_task; 


new _doctor : doctor; 

task body patient_task is separate; 

task body doctor_task is separate; 

task body receptionist is separate; 

begin 

for iin 1 .. number_of_doctors loop 
new_doctor := new doctor _task; 
new_doctor.identify self (new_doctor); 

end loop; 


end; 


Figure 29 - Body of the doctors_office generic procedure 


Note that the collection of patients is managed differently from the 
collection of doctors. In our solution, patients perform entry calls17 
but have no need to accept them; since no task calls an entry of a pa- 
tient, the identity of any individual patient is never needed. The easi- 
est and most straightforward way of allocating the required number of 
patients is to declare an array of patient_tasks having the appropriate 
bounds; subscripted components of this array are never referenced. 
Doctors, on the other hand, both perform and accept entry calls, so 
there is a need to identify individual doctors. We choose to identify 
them by allocating each one individually, obtaining a pointer to the 
newly allocated doctor _task and then passing that pointer value to the 


66 K. W. Dritz 


task itself (via its identify self entry), which saves it in a task-local 
variable; no collective record of all the doctors is retained outside of 
them, because none is needed. A doctor passes its own identity to the 
receptionist when it becomes free and requests a patient to treat. The 
receptionist accepts such a call from a doctor only when some patient 
is sick, at which time it passes the doctor’s identity on to the patient, 
who then interacts with the doctor to receive treatment. 


After the allocation and initialization of all the doctors, all the sub- 
sequent action occurs inside the dependent tasks of doctors_office; 
consequently, before returning to its caller, doctors office waits (at 
its end) for all its dependent tasks to terminate. Since they never do, 
in this implementation, doctors office never returns. 


4.3. Behavior of Patients 


The code implementing the behavior of patients is shown in Figure 
30. A patient’s behavior is very simple. Repetitively, a patient enjoys a 
random period of health, checks in with the receptionist (announcing 


separate (doctors_office) 
task body patient_task is 
assigned_doctor : doctor; 


delay random_healthy period; 
receptionist .patient_becoming_sick; 
receptionist .patient_requesting_doctor (assigned_doctor); 


assigned_doctor.patient_visiting_for_treatment; 


end loop; 


end; 
Figure 30 - Proper body of the patient_task task type 
sickness), requests from the receptionist the services of a doctor, is 


assigned a doctor when one becomes available, and visits the doctor to 
receive treatment. The problem statement says that, after the cure, 


Ada 67 


the patient and the doctor are to return to the receptionist; the pa- 
tient does not do so here, simply because there is no relevant action 
(affecting doctors or patients) that the receptionist needs to take on 
behalf of the patient at that point. Depending on the record keeping 
expected of the receptionist, additional interactions with it might be 
desirable. 


As we will see upon examining the detailed behavior of the recep- 
tionist, the call that a patient makes to inform the receptionist of 
sickness does not cause an indefinite suspension of the patient; it 
might cause a very brief suspension (i.e., if the receptionist is cur- 
rently occupied), akin to the wait for a monitor lock. The point to 
note is that the receptionist is designed so that, each time through its 
loop, it is receptive to a call from a patient to its patient _becom- 
ing_sick entry. On the other hand, a patient might very well become 
suspended when it immediately thereafter calls the receptionist at the 
latter’s patient_requesting_doctor entry, since the receptionist 
might not be in a state in which it is executing (or can execute) its 
accept statement for that entry. The availability of a doctor is a 
prerequisite to the execution of that accept statement. Eventually a 
doctor will become available to treat the patient (patients remain 
suspended not just until a doctor becomes available, but also until all 
prior requests by patients for doctors have been satisfied), at which 
time the patient’s entry call to patient_requesting_ doctor will result 
in a rendezvous with the receptionist, who will pass to the patient the 
identity of the available doctor. After concluding its patient _re- 
questing doctor rendezvous with the receptionist, the patient will 
straightaway call the assigned doctor at the latter’s patient_visit- 
ing _for_ treatment entry, remaining suspended in a rendezvous until 
the cure has been effected. 


68 K. W. Dritz 


4.4. Behavior of Doctors 


The code implementing the behavior of doctors is shown in Figure 
31. The repetitive behavior of a doctor, after receiving and storing its 
own identity, is to announce to the receptionist its availability to treat 
patients, then (once a patient has come to it for treatment) to render 
service in the form of a treatment that takes a random period of time. 
The problem statement says that a doctor (as well as its patient) re- 
turns to the receptionist at the conclusion of treatment; the doctor’s 
return to the receptionist, at least, is adequately modeled by our solu- 
tion, because after treating a patient the doctor’s next action is to in- 
form the receptionist immediately that it is again available to treat pa- 
tients. An additional interaction with the receptionist could be pro- 
grammed (at the bottom of the loop) to signal, for record-keeping 
purposes, the doctor’s conclusion of treatment, but there is no rele- 
vant action (affecting doctors or patients) that the receptionist needs 
to take on behalf of the doctor at that point. 


separate (doctors office) 
task body doctor task is 
self : doctor; 
begin 
accept identify self (myself : in doctor) do 
self := myself; 
end; 
loop 


receptionist .doctor_requesting patient (self); 


accept patient_visiting_for treatment do 
delay random _treatment_period; 
end; 
end loop; 


end; 


Figure 31 - Proper body of the doctor _task task type 


As we will see upon examining the detailed behavior of the recep- 
tionist, a doctor might very well become suspended when it calls the 


Ada 69 


receptionist to request a patient, since the receptionist is not always 
receptive to such calls; that happens when no patient is sick and is an 
essential part of the behavior to be modeled. Eventually a patient be- 
comes sick (the doctor remains suspended not just until a patient be- 
comes sick, but also until all prior requests for patients by doctors 
have been satisfied), at which time the doctor’s entry call to doc- 
tor requesting patient will result in a rendezvous with the recep- 
tionist, who will receive the identity of the doctor. The receptionist 
passes that identity to the patient with whom the doctor is being 
paired, and the patient calls the doctor so identified (at the doctor’s 
patient visiting for treatment entry). Thus, after completing its 
rendezvous with the receptionist, the doctor will treat a patient with- 
out further delay. Finally, note that the delay statement that causes 
the doctor’s suspension for the duration of the treatment keeps the 
patient suspended for the same time, since that delay statement is ex- 
ecuted as part of the rendezvous between patient and doctor. 


4.5. Behavior of the Receptionist 


The code implementing the behavior of the receptionist is shown 
in Figure 32. The behavior of the receptionist is subtle, but straight- 
forward. Repetitively, the receptionist just waits for, and then acts on, 
either a notification that 1 patient has become sick or a request by a 
free doctor for a patient to treat. The receptionist processes one pa- 
tient’s notification or doctor’s request each time through its loop, 
waiting if necessary for one to arrive. It is always receptive to the no- 
tification, but it is receptive to the request only when certain condi- 
tions are satisfied. Specifically, it will accept the request by a doctor 
for a patient to treat only if a patient is available to satisfy the request 
(i.e., only if a patient is sick). If, on some iteration of the loop, no pa- 
tients are awaiting treatment, then the receptionist will remain sus- 
pended until being notified of the arrival of a sick patient, even if 
there is an outstanding request by a doctor for a patient to treat. 


70 K, W. Dritz 


The wait for either of two entry calls is achieved by coding a selec- 
tive wait inside a loop. The second of the selective wait’s two accept 
alternatives is, furthermore, guarded so that, on some iterations of the 
loop (in particular, those in which the guard is false), a call to the 
guarded entry will not be accepted and will remain queued. When the 
guard is true (that is, when the second selective wait alternative is 
“open”), and calls to both of the entries have been made, then either 
entry call may be accepted; the one not accepted will remain pending 
and will be accepted on a subsequent execution of the selective wait 
(i.e., on a subsequent iteration of the loop). The logical behavior and 
the correctness of the program are insensitive to the nondeterminism 
of the choice, which gives one confidence that precisely the right lan- 
guage feature is being employed here. 


separate (doctors_office) 
task body receptionist is 
sick_patients : natural := 0; 
begin 
loop 
select 
accept patient_becoming_sick; 
sick_patients := sick_patients + 1; 
or 


when sick_patients > 0 => 


accept doctor_requesting patient (dl : in doctor) do 

accept patient_requesting_ doctor (d2 : out doctor) do 
d2 := dl; 

end; 

end; 

sick_patients := sick_patients - 1; 

end select; 
end loop; 


end; 


Figure 32 - Proper body of the receptionist task 


It should be noted that only two selective wait alternatives, repre- 
senting two communication channels, are needed in the selective 


Ada 71 


wait, rather than one for each doctor and each patient. The 
receptionist handles its interactions with all the doctors identically, 
and likewise its interactions with all the patients. The two selective 
wait alternatives are used to discriminate between two reasons why 
other tasks might interact with the receptionist, rather than to 
identify the interacting tasks. When the identity of a doctor is needed, 
that identity is passed as data by the doctor, during its interaction. 


If multiple patients are sick and are requesting doctors, their entry 
calls remain queued in FIFO order by Ada, and one call is taken out of 
the queue for the patient requesting doctor entry each time the ac- 
cept statement for that entry is executed. Thus, patients are treated 
in the order in which they requested doctors, as required by the 
problem statement. By the same token, if multiple doctors are trying 
to request patients from the receptionist, their entry calls remain 
queued in FIFO order by Ada, and one call is taken out of the queue for 
the doctor_requesting patient entry each time the accept statement 
for that entry is executed. The problem statement does not specify 
which of several available doctors should treat a patient, so our FIFO 
assignment of available doctors is only one of many that would have 
sufficed; this assignment is the easiest to provide, since it comes for 
free. Note that all the queuing required by this problem is performed 
implicitly by Ada, and no explicit maintenance of any queues is re- 
quired.!8 


When the receptionist accepts an entry call from a doctor request- 
ing a patient, which it does only when a patient is known to be sick, 
and while it is still engaged in a rendezvous with the doctor, it also ac- 
cepts an entry call from a patient requesting a doctor (and performs a 
rendezvous with it). The receptionist thus acts as the intermediary 
through which information—an available doctor’s identity—is passed 
from doctor to patient. Note that, if the receptionist were to accept 
an entry call from an available doctor without knowing that a patient is 
sick (as would happen, for example, if the second selective wait alter- 


72 K. W. Dritz 


native were unguarded), then the receptionist could become sus- 
pended at the accept statement for patient requesting _doctor, and 
would indeed do so, if no patient were sick (i.e., no patient _task 
were calling that entry). While that would have the desired effect of 
causing a doctor requesting a patient to become suspended when no 
patient were sick, the request could never be satisfied, because the 
receptionist would be unable to take note of a patient’s becoming sick. 
In fact, the program would deadlock. The use of a guard on the sec- 
ond selective wait alternative avoids this while still causing a doctor 
requesting a patient to become suspended when no patient is sick. 


In summary, the receptionist task is characterized by a loop con- 
taining a selective wait, allowing the task repeatedly to wait for, and 
then respond to, any of several alternative events. Guards are used to 
make it unreceptive to the occurrence of some of those events at 
times, while remaining receptive to the occurrence of others; the lat- 
ter, in fact, determine its receptivity to the former. This is a common 
and important tasking paradigm in Ada. It is worth noting also the use 
of nested accept statements in the receptionist task, which illus- 
trates the power that can be achieved through appropriate coupling of 
tasking constructs (rather than trying to record temporal “state” in 
data). 


It can be argued that the solution shown here gives patients, doc- 
tors, and the receptionist too much visibility to things they don’t need. 
We hasten to point out inspection reveals that no use is made of un- 
necessary visibility. Nevertheless, one can indeed structure the solu- 
tion differently, so that each task sees only what it needs to know. 
Such a solution would make additional use of library packages and 
generic library packages to encapsulate the behavior of the tasks, with 
limited visibility provided by disciplined use of context clauses and by 
importation into the generic library packages. The interactions of the 
tasks, which is really the essence of this problem, would remain un- 
changed. 


Ada 73 


5. The Skyline Matrix Problem 


The problem is to solve the matrix equation Ax=b, where Ais a 
skyline matrix of order n and x and b are vectors of order n. Skyline 
matrices are square matrices having varying numbers of leading zeros 
in their subdiagonal rows and supradiagonal columns, and interest in 
such matrices centers on the economies that can accrue from treating 
them as a kind of sparse matrix and not storing the leading zeros. 
Their special structure suggests the use of “ragged arrays”—two-di- 
mensional arrays (matrices) whose rows or columns do not all have the 
same bounds. The main purpose of the problem is to test whether the 
language has ragged arrays or, at least, primitive features that can be 
used to define and implement them. A secondary purpose of the 
problem is to see whether the language permits references to dimen- 
sion-reducing cross sections of multidimensional arrays, such as 
(partial) rows or columns of matrices, and whether it permits opera- 
tions on whole arrays. 


We solve this problem using Doolittle’s method of LU decomposi- 
tion [4] (a variation of Crout’s method in which L, rather than JU, is 
unit triangular). In Section 5.1 we present a solution for ordinary 
matrices (not having the skyline property). That solution reveals that 
Ada does not have a built-in capability for referencing cross sections 
(not to be confused with what Ada calls “slices,” which do not reduce 
dimensionality). However, since Ada has array-valued functions, one 
can define appropriate cross-section abstractions, at least to the 
extent that they are required for this problem.!9 Ada also does not 
have component-wise arithmetic operations on arrays, but such 
operations do not figure in the solution anyway. We use Ada’s ability to 
overload operators to define an inner-product operation denoted by 
the usual infix operator for multiplication. Then, in Section 5.2, we 
replace the type definitions for ordinary matrices by others for skyline 
matrices, and we refine the accompanying code to take account of the 


74 K. W. Dritz 


special structure of the matrices without obscuring the code’s 
underlying ties to Doolittle’s method. 


5.1. Solution for Ordinary Matrices 


We present in Figure 33 the specification of a generic package 
called matrices that imports a floating-point type and a positive num- 
ber n and exports types and subprograms for vectors and matrices of 
order n (called n_vector and nxn_matrix, respectively) whose compo- 
nents are of the given floating-point type. The exported function 
solve solves the matrix problem. Different instantiations of this 
generic package are required to obtain matrices of different sizes. 
The need for multiple instantiations could have been avoided by 
defining only unconstrained types for two-dimensional arrays, requir- 
ing the user to supply appropriate bounds in object declarations. 


generic 
type float_type is digits <>; 
n : in positive; 
package matrices is 
type vector is array (positive range <>) of float_type; 
subtype n_vector is vector (1 .. nj); 


type nxn_matrix is array (1 ..n, 1 .. n) of float_type; 


function row (row_index : positive; 
of_matrix : nxn_matrix; 
from : positive; 
to : natural) return vector; 
function col (col_index : positive; 
of_matrix : nxn_matrix; 
from : positive; 
to : natural) return vector; 
function "*" (left, right : vector) return float_type; 
function solve (a : nxn_matrix; b : n_vector) return n_vector; 


end; 


Figure 33 - Specification of the matrices generic package 


Ada 75 


By providing constrained types or subtypes for these arrays, we sim- 
plify the code by eliminating the need to check for matching bounds. 
We do provide an unconstrained type (named vector) for one-dimen- 
sional arrays to use as the result type of the row and col functions, 
which return partial rows and columns, respectively, and as the 
operand type of the * (inner-product) operator, which accepts partial 
rows and columns as operands. We do not use private types because 
we want the user to be able to exploit the fact that nxn_matrixisa 
two-dimensional array—for example, by using the normal subscript 
notation for references to a component of such an object. Also, were 
the auxiliary row and col functions and the * operator meant to be 
used only by the solve function, they could have been defined in the 
body of matrices; we define them, however, in its specification be- 
cause they are likely to be of use elsewhere in the user’s application. 


The body of matrices is shown in Figure 34. The full bodies of the 
auxiliary functions are included, but that of solve, which is somewhat 
larger, is here represented merely by a body stub, and its proper body 
is presented in the next figure. 


The row and col functions have been designed to return a partial 
row or column, extending from position from to position to. (Note 
that null arrays—arrays having no components—are allowed in Ada; row 
and coi return null vectors when to < from, as it does at several places 
in our realization of Doolittle’s method.) In the function implementing 
the * (inner-product) operator for vectors, it is implicitly assumed, for 
simplicity of presentation, that the index ranges of the two operands, 
left and right, are identical. The inner product of two null vectors is 
defined to be zero. 


76 K. W. Dritz 


package body matrices is 
function row (row_index : positive; 
of_matrix : nxn_matrix; 
from : positive; 
to : Matural) return vector is 
result : vector (from .. to); 
begin 
for iin from .. to loop 
result (i) := of_matrix(row_index, i); 
end loop; 
return result; 
end; 
function col (col_index : positive; 
of_matrix : nxn_matrix; 
from : positive; 
to : natural) return vector is 
result : vector (from .. to); 
begin 
for iin from .. to loop 
result (i) := of_matrix(i,col_index) ; 
end loop; 
return result; 
end; 


function "*" (left, right : vector) return float_type is 
sum : float_type := 0.0; 


begin 

for iin left'range loop 

sum := sum + left(i}) * right (i); 

end loop; 

return sum; 
end; 
function solve (a : nxn_matrix; 

b : n_vector) return n_vector is separate; 


end; 
Figure 34 - Body of the matrices generic package 
The heart of Doolittle’s method is, of course, localized in the solve 


function, whose proper body is shown in Figure 35. As usual, the 1 
and u matrices, which are lower and upper triangles in shape, share 


Ada 77 


the storage of an order-n matrix, lu. (The names 1 and u are both 
made synonymous with lu and play only documentary roles in the al- 
gorithm.) The diagonal of u is that of lu; the diagonal of 1, which is 
unit triangular, is all ones and is not stored. The triangles 1 and u are 
computed in the factorization step of our version of Doolittle’s method 
by alternately filling increasingly higher rows of 1 and columns of u. 
Note how the use of a simple abstraction for partial rows and columns 
and a familiar notation for inner products result in a clear and concise 
expression of the algorithm. For simplicity of presentation, no test for 


separate (matrices) 
function solve (a : nxn_matrix; b : n_vector) return n_vector is 
x, Y : nN vector; 
: nxn_matrix; 
nxn_matrix renames lu; 
nxn_matrix renames lu; 
begin 
-~ Factorization step (compute 1 and u). 
for iin 1 ..n loop 
for j} in 1 .. i-1 loop 
(4,3) := (ai, j) - row(i,1,1,3-1) * col(j,u,1,j-1)) / ulj,3); 
end loop; 
for j in 1 .. i loop 


u(j,i) := a(j,i) - rvrow(3,1,1,3-1) * col(i,u,1, 3-1); 
end loop; 


end loop; 
-- Forward substitution step (compute y). 
for iin 1 ..n loop 

y(i) := b(i) - row(i,1,1,i-1) * y(1..i-1); 
end loop; 
-- Backward substitution step (compute x). 
for iin reverse 1 .. n loop 

x(i) := (y(i) - vow(i,u,itl,n) * x(itl..n)) / u(i,i); 
end loop; 
-- Return the result. 
return x; 

end; 


Figure 35 - Proper body of the solve function for normal matrices 


78 K. W. Dritz 


singularity is provided, and no provision for the raising of a user-de- 
fined exception signaling singularity is made. 


It should be pointed out that the type nxn_matrix could have been 
defined as a one-dimensional array (with bounds 1..n) whose compo- 
nents are of the type n_vector. One would have to pick an interpreta- 
tion for such a definition as either a vector of row vectors or a vector 
of column vectors. Assuming the former for purposes of illustration, a 
reference to the (i,j)-th component of u would be written u(i) (4). 
Such an interpretation would allow a reference to a whole row (but not 
a whole column) of an nxn_matrix. By the same token, a partial row 
(but not a partial column) could be denoted by combining subscripting 
and slicing. For example, the i-th row of u extending from position 
i+1 to position n, as needed in the backward substitution step, would 
be denoted by u(i) (it1..n). Defining nxn_matrix as a vector of 
n_vectors would allow one to dispose of either the row function or the 
col function, but not both. 


5.2. Solution for Skyline Matrices 


The solution for ordinary matrices is now modified to exploit the 
special properties of skyline matrices. First, we replace the generic 
package matrices with one called skyline matrices, whose specifi- 
cation is shown in Figure 36. In place of the type nxn_matrix, we now 
define a type nxn_skyline_matrix, which is a record of two compo- 
nents, each an array of pointers: the component lower is a vector of 
pointers to the rows of the lower triangle, while the component upper 
is a vector of pointers to the columns of the upper triangle. Pointers 
are used, of course, to permit the rows and columns to be allocated 
dynamically with just the right “size” once that size is known 
(dynamically allocated storage is accessed through the pointer value 
obtained when it is allocated). Each row or column is represented by 
an object of the type bounded_vector, which is a discriminated record 
containing one component, e, and two discriminants, lo and hi. The 


Ada 79 


component e is an object of the unconstrained array type vector 
(whose definition has not changed) constrained by the bounds lo and 
hi—that is, by the values of the discriminants. Each object of the type 
bounded vector is thus self-describing as far as the bounds of the 
contained vector e are concerned. The bounds are supplied at the 
time of allocation of the object and remain fixed throughout its 
lifetime. 


A comparison of Figures 36 and 33 will reveal that we dropped the 
col function and simplified the row function somewhat ({i.e., by 
changing its parameters). We no longer need the col function because 
all the columns we need to reference are in upper triangles and are 
therefore now just one-dimensional arrays instead of one-dimensional 
cross sections of two-dimensional arrays (to be utterly precise, we 
should say that they are records that contain one-dimensional arrays). 
References to partial columns are easily obtained by using the slicing 
notation on column vectors. Unfortunately, not all of the rows refer- 
enced in Doolittle’s method are rows of lower triangles; if they were, 
we could dispense with the row function for similar reasons. The 
backward substitution step references a row of u. That row cuts across 
the column vectors representing the upper triangle of lu, and it might 
even have some components lying outside the skyline envelope of lu. 
Thus the row function is retained for the sole purpose of extracting a 
row from an upper triangle, and it is particularized to that aim. The 
partial row needed in the backward substitution step always extends to 
the end of the row, so we no longer need to pass the row’s upper 
bound as a parameter. Furthermore, since row is asked to extract a 
row only from the upper triangle of a skyline matrix, we no longer 
pass an entire skyline matrix to it; we merely pass the desired upper 
triangle, represented by the vector of pointers to its columns. 


Turning now to the body of skyline matrices (Figure 37), we note 
the following changes from its predecessor: 


80 K. W. Dritz 


generic 
type float_type is digits <>; 
n : in positive; 
package skyline matrices is 
type vector is array (positive range <>) of float_type; 
subtype n_ vector is vector (1 .. n); 
type bounded vector (lo : positive; hi : natural) is record 
e : vector (lo .. hi); 
end record; 


type vector ptr is access bounded vector; 


type vector ptrs is array (1 .. n) of vector ptr; 


type nxn_skyline_matrix is record 
lower, upper : vector ptrs; 

end record; 
function row (row_index : positive; 

of_cols : vector ptrs; 

from : positive) return vector; 
function "*" (left, right : vector) return float_type; 
function solve (a : nxn_skyline_matrix; 

b : n_vector) return n_ vector; 


end; 


Figure 36 - Specification of the skyline matrices generic package 


¢ The body of the row function now returns a row vector 
containing, in each position, either a component of the 
skyline matrix whose upper triangle is passed as an ar- 
gument (if within the skyline envelope) or zero (if out- 
side it). The bounds of the vector it returns could be 
shortened if zeros lie at either end, but that optimiza- 
tion hardly seems worthwhile. 


e The * operator (inner-product function) can now re- 
ceive a pair of vectors of different lengths; because the 
missing components are zero, they do not contribute to 
the inner product, and the loop can extend from the 
greater of the lower bounds of the two vectors to the 
lesser of their upper bounds. 


with min_and_max; 


package body skyline _matrices is 


package natural_min_and_max is new min_and_max (natural); 
use natural_min_and_max; 
function row (row_index : positive; 
of_cols : vector ptrs; 
from : positive) return vector is 
result : vector (from .. n); 
begin 
for iin from .. n loop 
if row_index >= of_cols(i).lo then 
result (i) := of_cols(i).e(row_index) ; 
else 
result (i) := 0.0; 
end if; 
end loop; 
return result; 
end; 
function "*" (left, right : vector) return float_type is 
sum : float_type := 0.0; 
begin 
for iin max(left'first, right'first) 
min(left'last, xright'last) loop 
sum := sum + left(i) * right (i); 
end loop; 
return sum; 
end; 
function solve (a : nxn_skyline matrix; 
b : n_vector) return n_vector is separate; 
end; 


Figure 37 - Body of the skyline_matrices generic package 


The revised proper body of the solve function is shown in Figure 
38. One small change is that 1 and u are now made synonymous with 
the lower and upper triangular components of 1u, respectively, rather 
than with the entire object 1u, as was done before. This simplifies the 
expressions we are required to write to denote components, rows, or 
columns of 1u (e.g., we write 1(i) .e(3) to denote the (i, 3)-th com- 


82 K. W. Dritz 


ponent of the lower triangle of lu). Similar renaming declarations 
could be given for the lower and upper triangles of a, but there are few 
references to a in solve, so the savings would not be significant. The 
solve function includes two steps not previously required: one to allo- 
cate the rows and columns of the local 1u skyline matrix before filling 
them in the factorization step, and one to deallocate them just before 
returning the result. It is a consequence of Doolittle’s method that 
the skyline envelope of 1u is the same as that of a. Thus, we allocate 
the i-th row of 1u to have the same lower bound as the i-th row of a 
and an upper bound of i~1 (recall that the lower triangle stops just 
short of the diagonal); similarly, we allocate the i-th column of 1u to 
have the same lower bound as the i-th column of a and an upper 
bound of i (the upper triangle includes the diagonal). 


In the factorization step, we see that the two inner loops now start 
at the index of the first nonzero component of the row or column be- 
ing filled, instead of at 1, since preceding components (those outside 
the skyline envelope) do not exist. The backward substitution step 
contains the only remaining reference to the row function. 


Our solution for skyline matrices obviously saves space by not stor- 
ing zeros outside the envelope of a skyline matrix. It also saves time 
by not computing them. Surprisingly, it is more efficient than the so- 
lution for normal matrices in yet another way: it calls far fewer subpro- 
grams to obtain (partial) rows or columns, and in so doing copies 
much less data. 


with unchecked_deallocation; 
separate (skyline_matrices) 
function solve (a : nxn_skyline matrix; 
b : n_vector) return n_vector 
: N_vector; 
: nxn_skyline_ matrix; 
vector_ptrs renames lu.lower; 
: vector ptrs renames lu.upper; 
procedure free is 
new unchecked _deallocation (bounded_vector, vector _ptr); 
begin 
-- Allocate the rows and columns of lu with the shape of a. 
for iin 1 ..n loop 
1(i) := new bounded_vector (lo => a.lower(i).lo, hi => i-1); 
u(i) := new bounded _vector (lo => a.upper(i).lo, hi => i); 
end loop; 
-- Factorization step (compute 1 and u). 
for iin 1 ..n loop 
for j in 1(i).1lo .. i-1 loop 
1(i).e(5) := (a.lower(i).e(j) - 
1(i) .e(1 (i) .lo.. 3-1) * u(j).e(u(j).10o..4-1)) / uj) .e(5); 
end loop; 
for j in u(i).lo .. i loop 
u(i).e(j) := a.upper(i).e(j) - 
1(3) -e(1 (5) .10..j-1) * ui) .e(u(i).lo 
end loop; 
end loop; 
-- Forward substitution step (compute y). 
for iinl ..n loop 
y(i) := b(i) - 1(1).e(1(1).10..4i-1) * y(1 
end loop; 
-- Backward substitution step (compute x). 
for iin reverse 1 .. n loop 


x(i) := (y(i) - row(i,u,itl) * x(itl..n)) / u(i).e(i); 
end loop; 
-- Deallocate the rows and columns of lu. 
for iin il ..n loop 
free (1(i)); 
free (u(i)); 
end loop; 
-- Return the result. 
return x; 
end; 


Figure 38 - Proper body of the solve function for skyline matrices 


84 K. W. Dritz 


In view of the similarities between the two solutions, it is reason- 
able to ask whether one can write a single version of soive that is suit- 
able either for normal matrices or for skyline matrices. The answer is 
yes, but we only sketch the details here. One starts by separating the 
solve function from the rest of the matrix abstraction, leaving a 
generic library package called matrices and a generic library function 
called solve. The matrix type would be exported from matrices and 
imported into solve; the two different implementations of the matrix 
type would be provided by two different realizations of matrices. To 
gain the desired degree of reusability, the single version of solve 
would have to be written so that it makes no assumptions about the 
implementation of the matrix type that it imports. Thus, the generic 
formal type by which it imports the matrix type would have to be a 
private type. Subscript and slice notation could then no longer be 
used in solve to denote components, rows, or columns of matrices; 
hence these operations would have to be exported (in the form of sub- 
programs) by matrices, and imported by solve.20 If the rest of the 
client’s application (exclusive of solve) is also to be independent of 
the matrix type, then it, too, would have to use the abstract operations 
exported from matrices. The abstraction could force the client to be 
independent of the details of the matrix type by defining that type as a 
private type. Clearly, the benefits of reusability come at some cost—the 
substitution of explicit operations (and their incurred overheads) for 
predefined operations. 


But there is another cost as well. In the factorization step of the 
skyline adaptation of solve, we skipped over the computation of com- 
ponents of 1 and u lying outside the skyline envelope, thus saving time 
as well as space. In the “generalized” version being contemplated 
here, we cannot know in advance whether a component lies outside 
the envelope, so we must compute it, even if it turns out to be zero. In 
order to retain the storage savings represented by not storing zeros 
outside the skyline envelope, the procedure (exported by matrices) 
that stores a value into a component of a matrix would be given, in the 


Ada 85 


skyline case, the responsibility for allocating a row (if the component 
being set is in the lower triangle) or a column (if it is in the upper tri- 
angle) upon the first attempt to store a nonzero value into a particular 
component; it would just discard zeros destined for a row or column 
that it has not yet allocated. (Note that pointers are initialized to null 
in Ada, making it easy to detect that a row or column has never been 
allocated.) This responsibility of the component-storing procedure 
meshes well with our realization of Doolittle’s method, since the com- 
ponents of the rows and columns of 1u are set in increasing order— 
that is, it is possible to defer allocating a row or column of 1u until an 
attempt is made to store its first nonzero component. In conjunction 
with this, the row and col functions must be prepared to return null 
arrays if their arguments denote rows or columns that have not yet 
been allocated. Although the responsibility for allocating rows and 
columns can be built into the skyline version of the component-storing 
procedure that would be exported with the matrix type, there is no 
way to handle the “automatic” deallocation of the rows and columns of 
a matrix (in the skyline case) upon leaving the scope of the object 
declaration for the matrix. Of course, Ada allows for such dynamically 
allocated storage to be reclaimed automatically (in this case, when 
solve returns); but as we said earlier in this chapter, such garbage 
collection is not routinely implemented in Ada.?! If explicit dealloca- 
tion is desired, a procedure to deallocate a matrix could be exported 
with the type, imported by solve, and called by solve just before it re- 
turns. In the case of normal matrices, the deallocation procedure 
would do nothing, while for skyline matrices it would have the behav- 
ior shown in Figure 38. 


The expense of computing and then discarding zeros outside the 
skyline envelope can, actually, be saved if we are willing to export ad- 
ditional operations with the matrix type, as part of the matrix abstrac- 
tion. That is, we could export a pair of functions that give the lower 
bound of a row or a column of a matrix. They would be used in the fac- 
torization step to skip over components of 1 and u outside the skyline 


86 K. W. Dritz 


envelope {in the skyline case). The rows and columns of 1 and u 
would still be allocated by the component-storing procedure discussed 
above. Thus, in the factorization step, we would not ask for the lower 
bounds of the rows and columns of lu (since they will not have been 
allocated yet); rather, we would ask for the lower bounds of the rows 
and columns of a. In the case of normal matrices, these enquiry func- 
tions would be designed so that they always return 1, of course. 
Clearly, such functions are not very elegant as part of a reusable ab- 
straction, but they do allow us to retain some of the speed economies 
in the skyline case. 


Finally, the matrix type should really be declared limited pri- 
vate, and not just private, in matrices. The assignment operation 
and the predefined equality comparison operation, which are among 
the few operations implicitly declared for private types, are not im- 
plicitly declared for limited types, and that can be important to us. 
When pointers are used in the implementation of an abstraction, one 
needs to determine whether assignment and equality comparison 
make sense. In our case, they do not. Assignment of one skyline ma- 
trix to another would copy only the contained pointer values; it would 
not replicate the rows and columns of the source value. Because the 
rows and columns of the target would therefore be the same as those 
of the source, after the assignment, changes to one would affect the 
other. By the same token, two skyline matrices with different, but 
equal, rows and columns would not compare equal, because the pre- 
defined operation for equality comparison would compare only the 
pointer values. Sharing of storage is desired in some applications, but 
not here. In a reusable abstraction, it is important to suppress the 
implicit declaration of operations that do not make sense, so that the 
abstraction will not be misused. Declaring the matrix type in matrices 
as limited private will accomplish that. (We were not concerned 
in Section 5.2 with the risks of not making the matrix type limited 
because, as we said there, we specifically wanted the user to be able to 
exploit knowledge of its implementation, which required us to forgo 


Ada 87 


the use of private types and therefore also limited types; we tacitly as- 
sumed that the user would not misuse that knowledge by, for example, 
assigning or comparing skyline matrices.) Although assignment and 
equality comparison of matrices are not needed in solve, no such as- 
sumption should be made by the designer of matrices, if its two ver- 
sions are truly meant to be reused in a variety of applications. Thus, 
the designer of matrices would be compelled to provide those opera- 
tions explicitly as part of the matrix abstraction, using subprograms or 
operator overloading; of course, the designer of matrices could then 
also ensure that the operations have the desired behavior in the sky- 
line case. These considerations dictate that the generic formal type by 
which solve imports the matrix type must also be declared limited 
private, but that is possible precisely because solve does not need 
assignment or the predefined equality comparison operation for the 
matrix type. 


Acknowledgements 


The preparation of this chapter was supported by the Strategic 
Defense Initiative Organization, Office of the U.S. Secretary of Defense, 
under PMA E2304. 


Footnotes 


1. The inability to parameterize tasks is especially felt when similar 
tasks are aggregated into arrays and set to work on different parts of a 
problem—a recurring strategy in several of the following problem so- 
lutions. At present, the rendezvous mechanism must be used to com- 
municate with tasks to give them some idea of their own “identity.” 
Mechanisms for parameterizing tasks have been proposed for Ada 9X; 
these have the prospect of eliminating some of the tasking communi- 
cation in our solutions, thereby simplifying them. 


2. This is a simplification, but appropriate for this introduction. 


88 K. W. Dritz 


3. Throughout our code examples, Ada keywords, which are re- 
served words, will be emboldened. This will distinguish them well 
enough from identifiers to allow both to be written in lower case. Ada 
keywords appearing in ordinary text will also be emboldened when- 
ever some kind of punctuation seems appropriate. 


4. What we mean by reusable is as general as possible, so that it can 
be used in a wide variety of applications requiring the same general 
behavior, irrespective of differences in irrelevant details. In Ada, 
reusability is achieved primarily through the definition and instantia- 
tion of generic units, especially generic packages. Streams of values of 
an arbitrary type are an obvious candidate for a reusable abstraction; 
though the type of the values held in a stream might vary with differ- 
ent applications, or even with different streams in a single application, 
the operations on streams are independent of that type. We purposely 
do not show how to program a generic “streams” package in Ada be- 
cause it is highly likely that, in a mature working environment, the 
programmer desiring to solve Hamming’s problem would find that 
someone previously had a need for the stream abstraction and that a 
generic package exporting the desired type and operations (perhaps 
with different names than we have used) already exists in a local li- 
brary. Such a library can be considered to extend the language. 


5. All of the library units said in this chapter to be “assumed to 
exist” were actually written and used for testing the solutions. 


6. All of the integer values used by, and generated in, our solutions to 
Hamming’s problem are greater than zero, except that this 
assumption of positiveness is not made (since it is not required) for 
the subscripts of objects of the unconstrained array type 
positive array. It is therefore appropriate to use, as we have done, 
the subtype positive, instead of the type integer, for all of our 
variables or their components; doing so increases the security of the 
programs since, as a consequence, the procedure solver can not then 
be called with scalar actual parameters having nonpositive values or 
composite actual parameters having nonpositive components, which if 
allowed would cause obscure failures. We have applied this reasoning 
to the fullest extent, though in reality the necessary security would 
have been achieved by declaring just the formal parameters of solver 
to be (or to have components) of subtype positive, and every other 
integer variable or component, including the components of streams, 


Ada 89 


to be of type integer. (One can prove that, if the actual parameters in 
a call to solver, or their components, are all positive, then all integer 
values generated by the solutions will be positive.) 


7. Within the constraints of those dataflow properties, different 
orderings in different runs are caused, of course, by probabilistic char- 
acteristics of the computing load. 


8. Well, almost; one of them would execute min_finder. 


9. Actually, Ada allows for implicit garbage collection, but it also 
provides mechanisms for explicit freeing of dynamically allocated stor- 
age. Had we found implicit garbage collection to be routinely imple- 
mented, we might have been motivated to obtain a solution in the 
spirit of Turner’s; in reality, it has only rarely been implemented in 
Ada systems. Ada solutions significantly different from the present 
one would have required more storage or more time, and certainly 
more code. 


10. Our lexicographic ordering is such that two radicals are lexico- 
graphically ordered if the first is smaller than the second or if they are 
of the same size and the first occurs not later than the second on the 
list of radicals of their common size. 


11. Actually, the only use of either name in one of these places is in 
the proper body of the generic procedure enum_rad_tuples, which is 
one of the subunits of paraffins; see Figure 25. 


12. This unconstrained array type was declared in the parent unit, 
paraffins. We explained in the preceding paragraph why 
enum_partitions was invested with the generic formal type ar- 
ray_type, with which array of naturals is here associated. The 
other generic procedure, enum_rad_tuples, needs array of naturals 
as well; however, since that generic procedure is quite problem spe- 
cific anyway, and for that reason is nested within paraffins, it obtains 
visibility of array_of_naturals by virtue of that nesting and does not 
need to import array of naturals by generic parameter association. 


13. This computation requires min and max functions, which are ob- 
tained by generic instantiation, with the predefined type integer, of 
the same application-independent generic library package, 
min_and_max, used in the solution of Hamming’s problem. The type 


90 K. W. Dritz 


integer is used as the generic actual parameter, rather than a subtype 
thereof, like positive or natural, because the argument expressions 
in the present invocations of min and max can yield negative values. 


14. The loop at each level of recursion contrasts with the statically 
nested loops and lack of recursion found in some other solutions; it 
has the advantage that it simulates an arbitrary level of nesting. Of 
course, this strategy was motivated by the desire to write a single 
generic procedure that could serve for the binary, ternary, and qua- 
ternary partitions needed for the problem solution. 


15. This is a subtle consequence of the need to deliver a tuple of lexi- 
cographically ordered radicals. 


16. Admittedly, this strategy does not have good load-balancing prop- 
erties. One could get by with fewer tasks, since one task could create 
several lists of molecules of small size sequentially in the time that it 
takes another task to create a list of molecules of large size. 


17. For convenience throughout the remainder of this section, we 
refer to objects of type patient_task as “patients,” objects of type 
doctor_task as “doctors,” and the single receptionist task as “the 
receptionist” when discussing the behavior of our program units. We 
occasionally use the same words to refer to the people that they 
model, as when we discuss the problem statement. 


18, The solution demonstrated during the conference in 1988 was far 
less elegant; it did involve explicit queuing. 


19. There is no getting around the inability to define an array with 
nonunity stride, so the closest one can come to providing cross sec- 
tions is to define a function that returns a (partial) row or column of a 
matrix by copying its components into a unity-stride vector. 
Fortunately, that suffices here. 


20. In addition to a function for fetching a component of a matrix, 
given the matrix and its component indices, a procedure for storing 
into a component of a matrix—given the matrix, its component 
indices, and the value to be stored—would have to be provided by the 
matrix abstraction. 


21. 


Ada 91 


Ada 9X will likely introduce “finalization” of objects or types, 


which will solve this problem elegantly. 


References 


1. 


10. 


11. 


12. 


Ada 9X Project. Ada 9X Requirements. Office of the Under Secre- 
tary of Defense for Acquisition, Washington, D.C., December 1990. 


Booch, G. Software Engineering with Ada. Benjamin/Cummings, 
Menlo Park, CA, 2nd Edition, 1987. 


Cohen, N. H. Ada as a Second Language. McGraw-Hill, 1986. 


Fox, L. An Introduction to Numerical Linear Algebra. Clarendon 
Press, Oxford, 1964. 


Habermann, A. N. and I. R. Nassi. Efficient Implementation of Ada 
Tasks. CMU Report CMU-CS-80-103, Carnegie Mellon University, 
Pittsburgh, PA, January 1980. 


Hoare, C. A. R. Monitors: An Operating System Structuring Con- 
cept. Communications of the ACM, 17(10):549-557, October 
1974. 


Hoare, C. A. R. Communicating Sequential Processes. Communi- 
cations of the ACM, 21(8):666-667, August 1978. 


Knuth, D. E. The Art of Computer Programming, Volume I. 
Addison-Wesley, 1968. 


McGraw, J. et al. SISAL: Streams and Iteration in a Single- 
Assignment Language, Language Reference Manual, Version 1.1. 
LLNL Report M-146, Lawrence Livermore National Laboratory, 
Livermore, CA, July 1983. 


Turner, D. A. The Semantic Elegance of Applicative Languages. In 
Proc. Conf. on Functional Programming Languages and Computer 
Architecture, Portsmouth, NH, October 1981, 85-92. 


U.S. Department of Defense. Requirements for High Order 
Programming Languages, Steelman. U.S. Government Printing 
Office, Washington, D.C., June 1978. 


U.S. Department of Defense, Ada Joint Program Office. Reference 
Manual for the Ada Programming Language, ANSI/MIL-STD- 
1815A. U.S. Government Printing Office, Washington, D.C., 1983 


92 


K. W. Dritz 


(adopted by ISO as ISO/8652-1987, Programming Languages— 
Ada). 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 93 


The C* Parallel Programming Language 


David L. Andrews 
CASE Center, Syracuse University 
Syracuse, NY 13244 


Eric Barszcz 
Nasa Ames Research Center 
Moffett Field, CA 94035 


L Background 


C*! is an extension of the C language designed for Thinking Mac- 
hines’ SIMD (Single Instruction Multiple Data) computer, the Connec- 
tion Machine? [1]. The language design assumes a Von-Neumann style 
front-end computer (such as a VAX3) working in cooperation with an 
array of identical processors each with local memory as shown in 
Figure 1 [8]. The processors in the array simultaneously execute the 
same instruction stream while communicating data between each 
other and the front end. C* assumes the same abstract machine 
model as C sharing the features of: 1) a uniform address space, 2) se- 
quential execution, and 3) pointer arithmetic. The only difference in 
the models arises from C*’s execution of the instruction stream by 
many processors instead of just one. This difference is a key feature of 
the C* language which seeks to assign a processor to every data item 
of interest. Just as a well written C program scales with the size of ar- 
rays, C* scales by assuming a “virtual” processor is assigned to every 
point in an array. When more virtual processors are required than the 
actual number of physical processors requested, multiple virtual pro- 
cessors are assigned to each physical processor. (For the remainder of 
the chapter, we use the terms “virtual processors” and “processors” 
interchangeably unless explicitly stated). The number of virtual pro- 


94 D. L. Andrews and E. Barszcz 


Front End 
= 


Front End Memory 


Instruction 
Bus 


Figure 1 - The Connection Machine 


cessors that can be assigned to a physical processor is limited only by 
local memory. 


1.1. Language Design 


The C* programming language adheres to the SIMD philosophy of a 
single instruction stream and provides a simple control mechanism 
that remains independent of the number of physical or virtual process- 
ing elements. C* extends the standard C conditional instructions to 
allow a programmer to enable and disable individual processing ele- 
ments during execution. These extensions are based on just three 
new language features: 


¢ Apoly type attribute that identifies parallel data 


¢ A domain feature for organizing parallel data, similar in 
syntax and semantics to the C struct and the C++ class 


* A selection statement for activation and deactivation of 
parallel execution 


c* 95 


Most of the parallel power of the language arises from the applica- 
tion of existing C operators and statements to the new parallel data 
type poly. For example, the arithmetic operation a ° b specified on 
the two poly operands a and b will cause all enabled processing ele- 
ments to perform the multiplication. Further, if 16k processing ele- 
ments are enabled and contain an aand b, 16k multiplications will be 
performed simultaneously. C* preserves the interchangeability of ar- 
rays with pointers, a feature central to the C language. All interpro- 
cessor communications are expressed in terms of assignment and 
pointer indirection. No new expression operators are required for 
parallel execution, although two operators for computing the maxi- 
mum or minimum of parallel operands are introduced. 


1.2. Sequential and Parallel Data Representations 


Data is represented in C* as either scalar or parallel. Scalar data is 
identified by the keyword mono and exists in the memory of the front- 
end host computer. Parallel data is identified by the keyword poly 
and exists in local processor memory in the processor array. Data de- 
clared within a domain is by default poly, whereas data declared out- 
side a domain is by default mono. The keyword domain is used in two 
different contexts: 1) analogous to the C keyword struct for declar- 
ing and defining operands, and 2) to scope portions of a C* program 
signifying where processors containing operands of that domain type 
are to be activated. C* is subjected to the same declaration, definition 
and initialization rules as C. Some example declarations are: 


mono int uno; /** a scalar integer residing in the 


host's memory **/ 


poly int ddata; /** multiple integers residing in the 


processor's local memories **/ 


96 D. L. Andrews and E. Barszcz 


mono int *poly ddata; /** multiple pointers on the array 
processor all pointing to a scalar 


in the host's memory **/ 


poly int *poly ddata; /** multiple pointers on the array 
processor pointing to parallel the 


data existing in the array processor **/ 


1.3. Expressing Basic Computations 


All C* code is separated into two types: scalar and parallel. Code 
belonging to a domain is parallel code and is executed on the proces- 
sor array. All other code is serial and is executed on the front-end 
host processor. The two code types are written using identical syntax 
and are distinguished only by syntactic context. All standard C ex- 
pression operators and all standard C statement types may be used in 
parallel code in the same manner as they are used in serial code. 


1.4. Expressing Parallel Computations 


Synchronous parallel execution of the same instruction by many 
data processors can occur only if each processor executing the in- 
struction contains the same memory layout. The C* structure type 
that describes the memory of a data processor is called a domain. 
Two example domain declarations are shown below. The associated 
memory layout for domain Enum is shown in Figure 2. 


/** sample declaration **/ /** sample declaration **/ 
domain Enum { domain Paraffin { 

int number; char *wax; 

domain Enum dptr; int a; 


v } Pe_Paraffin[100]; 


c* 97 


Figure 2 - Memory and pointer configuration for domain Enum. 


domain Enum declares an integer and a pointer to reside in the lo- 
cal memory of all processing elements to be executing in domain 
Enum as shown in Figure 2. domain Paraffin declares a character 
pointer and an integer to reside in each local memory of the 100 pro- 
cessors assigned to execute in domain Paraffin. A specific instance 
of any field declared within a domain may be accessed by specifying 
the processor array index, and the field using the C syntax for access- 
ing a field in a struct. For example, Pe_Paraffin[50].wax specifies 
character string wax residing in (logical) processor 50’s local memory. 


1.4.1. New Operators 


C* takes full advantage of the compile time distinction between se- 
rial and parallel data, extending existing C operators through overload- 
ing to operate on parallel data. Only two new operators were added, 
the minimum operator <? and the maximum operator >?. Together, 
the new operators and the extended version of the set of standard C 
operators allows data movement between processors in the following 
ways; 


¢ reading: fetching one value from a particular data 
processor to the front end 


98 D. L. Andrews and E. Barszcz 


¢ writing: storing one value from the front end into 
a particular data processor 


* replication: broadcasting a value from the front end 
to all parallel processors 


* reduction: reading values from all parallel proces- 
sors, delivering a combined result 


¢ permutation: communication among data processors 
in both regular and irregular patterns 


As a simple example of a reduction operation, consider the following 
program fragment: 


/** similar to struct **/ 
domain Pebble Beach { 
int hole score; /** a parallel variable **/ 
} Pebble _beach_hole[18]; 
main() { 
/** reset the round score **/ 
int round_score = 0; 
/** activate all 18 PE's of Pebble_beach_hole **/ 
[domain Pebble Beach]. { 
/** sum reduction to front end **/ 
round_score += hole score; 


feb 


The code within the scope of domain Pebble Beach is executed 
only by processors assigned to domain Pebble Beach. The variable 
round_score is declared outside of the domain and is by default a 
scalar value residing on the host front end. The familiar C syntax += 
now assigns to round_score, it’s current value plus the value of 
hole score, but hole_score is a parallel variable. The assignment op- 
eration round_score += hole_score will actually sum all values of the 
parallel variable hole score from all active processors and add this 
summed total to the existing value of round_score on the front end. It 


c* 99 


should be noted that the parallel values are added together “as if in 
some sequential order,” but the order is not actually known. This sub- 
tlety should be kept in mind when performing certain numerical 
computations where the order of addition is critical. 


Suppose we wanted to find the maximum value of hole_score in- 
stead of summing all values. The maximum operator >?= provided by 
C* performs this operation as shown in the code below. 


domain Pebble Beach { 
int hole_score; 
} Pebble _beach_hole[18]; 
main() { 
/** reset front end variables **/ 
int max_strokes_on_hole = 0; 
[domain Pebble Beach]. { 
/** return the largest value of parallel variable **/ 


max_strokes_on_hole >?= hole score; 


1.5. Processor Selection 


Processors are activated by entering a domain. Processors are de- 
activated within a domain using familiar conditional C operators. Each 
enabled processing element in the processor array performs the con- 
ditional test independently and either remains active or deactivates it- 
self based on the test outcome. Suppose we wanted to count the num- 
ber of holes for which the score was equal to the value of max_ 
strokes_on_hole. The code fragment for performing this operation is 
shown below. All processors execute the conditional, but only pro- 
cessors for which the conditional is true execute the code within the 
conditional. The program will return a count of the number of proces- 
sors containing the largest value of the variable max_strokes_on_hole. 


100 D. L. Andrews and E. Barszcz 


domain Pebble Beach { 
int hole score; 
} Pebble _beach_hole[18]; 
main() { 
int max_strokes_on_hole = 0; 
0; 


/** activate all 18 pe's **/ 


int number_duffed_holes 


{domain Pebble Beach]. { 
/*** find the worst hole ***/ 
max_strokes_on_hole >?= hole score; 
/** count pe's hole_score = max_strokes_on_hole **/ 
if( hole score == max_strokes_on_hole ) { 
/** count the number of duffed holes **/ 
number duffed_holes += (poly)1; 


Processing elements within a domain may refer to themselves by 
the keyword this representing a processing element’s absolute ad- 
dress. A processors relative address may be found by subtracting from 
its absolute address, the absolute address of the first processor. The 
code below assigns to the variable highest_scored_hole, the largest 
relative address of all processors whose hole_score variable is equal to 


max_strokes_on_hole. 


domain Pebble Beach { 
int hole score; 
int hole number; 

} Pebble_beach_hole[18]; 

main() { 
/** reset front end variables **/ 
int max_strokes_on_hole = 0; 
int highest_scored_hole = 0; 
[domain Pebble Beach]. { 


/** compute all relative addresses in parallel **/ 


c* 101 


hole number = this - &Pebble_beach_hole[0]; 
/** return the largest value of parallel variable **/ 
max_strokes_on_hole >?= hole score; 
/** select all pe'’s whose 
hole_score = max_strokes_on_hole **/ 
if (hole_score == max_strokes_on_hole) 
/** select greatest pe with largest hole_number **/ 


highest_scored_hole >?= hole_number; 


2. Hamming’s Problem Extended 


Hamming’s problem extended maps well to the Connection 
Machines SIMD architecture, and is straight forward to express in C*. 
The problem may be stated as given a set of primes {a, b, c, ...} of un- 
known length and an integer N, output in increasing order all product 
terms of the form 


aieblecke .. SN. 


To efficiently implement Hamming’s problem extended on the 
Connection Machine, all possible arithmetic computations that can be 
performed in parallel need to be exposed to take full advantage of the 
large number of processors contained in the processor array. To ex- 
pose the parallelism inherent in the problem consider the powers of 
the primes a and b, 


Az={a@diadad<s¢<N,i=0,1,...} 


B={ bil b's N,i=0,1,...}, 


respectively. Then the convolution of sets A and Bis a set of partial 
values for the solution of Hamming’s problem extended, 


Conv = {ab°,...,@b',al b®°,...,alb!,....d@b°,..,@b) @bi< N. 


102 D. L. Andrews and E. Barszcz 


Figure 3 - Processor Memory Layout for Hammings' Problem 


A new set of partial values is obtained for each next prime by con- 
volving the elements of set Conv with the power set of the prime. 
After this operation has been performed on all primes, the set Conv 
will contain the unsorted elements that are the solution to Hamming’s 
problem extended. This list can be quickly sorted on the Connection 
Machine to obtain the solution to Hamming’s problem extended in as- 
cending order. Figure 3 shows the distribution of prime number val- 
ues in the processors of the Connection Machine. Each processor re- 
ceives a unique combination of the primes and computes a single con- 
volution output point. 


The C* program first computes the partial results for each prime 
number input from the list of primes in domain Primemult. A unique 
a, i= 0,...j, a! < N, is computed in each processor and placed in local 
memory. The computation of all a’s is performed in loga(log,N) time 


as shown in Figure 4 and is expressed in C* by 


/** initialize scalar variables **/ 
pwrtwo = 1; 


max = 0; 


/** within domain Primemult **/ 
/** compute each PE's address **/ 
int addr = (int) (this - &Prime_Procs[0]) 


while (( max >?= prime_number) <= N) { 


c* 103 


Pe[0] Pe[1] Pe[2] Pe[3] Pe[4] Pe[5] Pe[6] ... 


Figure 4 - Computation of a**i values, a**i < N 


if (addr >= pwrtwo) 
prime_number *= Prime Procs[addr-pwrtwo] .prime_number; 
pwrtwo *= 2; 


} /** end while **/ 


The >?= operator returns the largest value in memory location 
prime_number for all enabled processing elements. The loop will exit 
when an atis computed that is greater than the integer nN allowing ar- 
bitrary values of N and prime_number to be used without specifying loop 
limits. The number of a! values computed is determined by querying 
only those processing elements with computed prime_number values 
less than or equal to N, and using the >?= operator on their addresses. 


After each prime number has been expanded, a®, a!, a2, ... al< N, 
the values are distributed throughout the processor array in domain 
PrimeExpand as shown in Figure 5. The C* code that performs this 
operation is given by 


/** maxpe == number of primes < N **/ 

/** rep == current size of the set Conv **/ 
modaddr = offset; 

for (indx = 0; indx <= maxpe; indx+t+) { 


frontprime = Prime_procs[{indx].primenum; 


104 D. L. Andrews and E. Barszcz 


if (modaddr < (indx*rep + rep) && modaddr >= rep*indx) 
temp *= frontprime; 


» } 


where each afvalue to be distributed is first transferred from proces- 
sor Prime Procs[indx] in domain Primemult to the front end and re- 
broadcast into a range of processors in domain PrimeExpand specified 
by the 


if (modaddr < (index*rep + rep) && modaddr >= rep*indx) 


statement as shown in Figure 5. Note that at this point in the pro- 
gram, the computed product values are not being checked against Nn to 
determine if the value is greater than our limit. After all primes have 
been input, expanded in domain Primemult, distributed and the par- 
tial values updated in domain PrimeExpand, all values of interest less 
than N can be identified by conditionally selecting processors whose 
product values are less than N, in a single conditional operation. The 
selected values are then passed to a sort routine that returns the val- 
ues in ascending order. 


domain Primemult performs the multiplication of each 
ad,i=0,1,..j,a@<N 


in Ollogaflog,N]). domain PrimeExpand performs the distribution of 
the a's computed in domain Primemult and the multiplication of all L 
input prime numbers in O(L « log, n) with a given by the smallest 
prime input. The computed values are sorted using a Batcher sort 
with computational complexity 


O(log2M), M = (loggN * logpN °...), 


the cardinality of the set Conv. Therefore, the worst case computa- 
tional complexity of the algorithm is given by the step that performs 
the multiplication of all Linput prime numbers, and is O(L ¢ log,N). 


c* 105 


Frontprime 


bo D9 pO HOlp! pH! pH! pl] pH? HO HO HOlb! vb! pb! DH! IH? Ho HD? HOlb! b! pb? Db! 


Figure 5 - Distribution of the Computed Prime Values into Memory 


Further, this fairly efficient algorithm was easily expressed in C*. The 
prime values were streamed in from a data file terminating when EOF 
(end of file) was recognized by the while loop, allowing arbitrarily 
long lists of prime numbers to be input without requiring any memory 
management. 


3 Paraffins Problem 


The algorithm developed for solving the Paraffins problem in C* is 
partially based on Turner [4]. Each paraffin molecule is represented 
by a list of integers. Each integer in the list represents a sub- 
molecule. As an example, methane is represented as [1, 3] 


and one of several possible representations for propane is [3, 3, 1, 1]. 


106 D. L. Andrews and E. Barszcz 


The following function by Turner generates a list containing all paraf- 
fin molecules with n carbon atoms; 


paraffin n = quotient equiv {[x,3] | x <- para (n-1)} 
quotient f (a:x) = a: {b | b <- quotient f x; ~ fab } 
quotient f [] = [] 

para 0 = [1] 


para (n) {f{a,b,c] | i ,3,k <- [0, ..., n-1]; i+j+k = n-1; 


a <- para i; b <- para 3; c <- para (n-1-i-j)} 


The C* algorithm starts by computing all values of para(n), in 
parallel in domain Enum. This is accomplished by forming all 3-se- 
lections [2] of the set 


{ lab,c] |O< ab,c $< n-1; abtc = n-1}. 
Each processor computes a single 
[a,, by, Chl, tbc, = n-1, 0 < a,b,c, < n-1 


based on the processors relative address, using modulo arithmetic. 
The following code performs this operation, 


index = n; /** para(n) **/ 

offset = (int) (this - &EnumPe[0]); 

if (offset <= max_size) { /** max_size = i9 **/ 
a = offset % index; /** mod a address **/ 
mod_b = offset / index; /** mod b address **/ 
b = mod_b % index; /** mob b address **/ 


c = index-1l-a-b; 


} /** end if **/ 


c* 107 


Figure 6 - Processor Memory Layout for Enumeration 


All [a, bj c, ]’s are computed in parallel in constant time. The number 
of processing elements required to perform the parallel enumeration 
is given by max_size, defined at run time. Execution of this code 
fragment is graphically illustrated in Figure 6. 


The next step in the C* algorithm performs a histogram of each 
[a, b; ¢,] representing the submolecules that form a particular para (n). 
All processors perform the histogram function in parallel by executing 
the following code in domain Paraffin; 


para(a] += 1; para[b] += 1; para[c] +=1; 


Figure 7 shows the outcome of this operation for 1 < N<5, Para[r] 
now represents the number of digits in [a,; bj c,] equal to r; i.e. para[0] 
equals the number of digits that equal zero, para[{1] equals the num- 
ber of digits that equal 1, etc. With only three digits in [a,; bj; c,], the 
maximum value any para[i] can take is 3. 


The array para is expanded in domain Paraxpand by copying each 
processors’ para array from domain Paraffin into a range of proces- 
sors in domain Paraxpand using pointer indirection. The number of 
times each para array is copied into domain Paraxpand is equivalent 
to the number of possible combinations of the submolecules used in 
forming the paraffin molecule. Figure 7 shows two submolecule com- 
binations are possible for n= 2. All paraffin molecules that contain n = 
2 as a submolecule will have non zero entries for their para[2]. The 
code that performs this operation is given as, 


108 D. L. Andrews and E. Barszcz 


Figure 7 - Para Array Configuration 


[domain Paraxpand}. { 
mono int index; 
mono int upper; 
domain Paraffin *transptr; 
base = 0; 
pe_num = this ~ &Pxpnd Pe[0]; /** relative Pe offset **/ 
for(index = 0; index < count; index++) { 
7** count = # pe'’s **/ 
/** pointer to domain Paraffin **/ 
transptr = &ParaPe[index] ; 
if(pe_num == 0) /** first elements pointer **/ 
/** expand set in domain Paraffin **/ 
upper = transptr -> expand; 
if((pe_num < base + upper) && (pe_num >= base) ) { 
rep = index; 
/** pelem == i, O< i < N **/ 
carbon_number = transptr -> pelem; 
for(loop = 0; loop < N; loopt+t) 
parray[loop] = transptr -> para[loop]; 


} /** end if **/ 


c* 109 


base += upper; 


} } 


Note that parallel reads are occurring between the two domains using 
the poly pointer transptr in domain Paraxpand. A for loop se- 
quences through the elements of para, transferring an element at a 
time into domain Paraxpand. The expanded para array in domain 
Paraxpand is shown in Figure 8. 


After the para array has been expanded in domain Paraxpand, each 
para n is computed recursively as shown in Figure 9. The first entry 
in each para array (para[0]) will be the first integer in a paraffin 
molecule representation. The second entry (para{1]) represents the 
number of 3’s in a representation. All other entries are found by re- 
cursive substitution. As an example of the recursive substitution, if a 
para array has an entry in para[2} such as processor 2 in Figure 8, 
then the sub-molecule representation for n = 2 is appended onto the 
representation for that molecule. If the entry is greater than one, 
such as processor 8 in Figure 8 for para[2], then the sub-molecule is 
copied that number of times into the representation for the molecule. 


The C* algorithm performs the above operations in domain Para- 
xpand within a sequential loop. Each processor builds up a paraffin 
molecule in the array polymer. The code that performs this operation 
is listed below and shown graphically in Figure 9. 


/** loop thru all possible carbon numbers **/ 
for(k= 2; k < N; k++) { 


[domain Paraxpand] . { 


if((parray(k] != 0) && (carb num>k ) { 
/** substitute k -> kt1, k+t2, oo. N **/ 
pfinptr = &Pxpnd_pe[target]; /** poly pointers **/ 
/** number of integers in k molecule **/ 


dsize = pfnptr -> lngth; 


110 D. L. Andrews and E. Barszcz 


Figure 8 - Expanded Para Array in domain Paraxpand 


/** mis poly; everyone copies dsize int's **/ 
for(m=0; m < dsize; mtt+) { 
/** get polymer for k carbons **/ 
paraffin[m] = pfnptr -> polymer[m]; 
} 
/** pe copies polymer parray[k] times */ 
for(m=0; m < parray[k]; mtt+) { 
/** append polymer k **/ 
polymer[lngth + cnt] = paraffin[cnt]; 
Ingth += dsize; /** update your length **/ 
+ ) } } 


The variable 1ngth in the above code holds the current number of 
integers for each paraffin molecule. Each processor that is performing 
an update on it’s own polymer array is required to read 1ngth as well 
as the polymer array of the target processor to determine the current 
number of integers in the array. Each processor then sets a temporary 
variable dsize equal to the targeted processors Ingth and sequentially 
copies the polymer array into the dummy array paraffin. The dummy 
array paraffin is then appended parray{k] times onto the current 
polymer array. 


c* 


polymer[2] Pe 


polymer 
array 


polymer{(3] Pe 


polymer 
array 


Figure 9 - Substitution of Paraffin Molecule Based on Parray 


112 D. L. Andrews and E. Barszcz 


After all substitutions have been performed in the above sequential 
loop, an equivalence relation is applied to all polymer arrays and re- 
dundant representations are deactivated. 


All arrays in the C* program were statically allocated to sizes 
greater than required, but in a more general solution, dynamic array 
allocation using malloc() would be necessary. At present C* does not 
provide a parallel malloc() to be used on the processor array. 


4, ADoctor's Office 


The Doctor’s Office problem involving a set of concurrent processes 
that interact asynchronously and must respond to asynchronous events 
is easily expressed in C*. The two asynchronous processes of patients 
becoming ill and doctors curing waiting patients are performed in the 
two domains, domain Patient and domain Doctor. The memory de- 
clarations for both domains are shown below in the domain declara- 
tions domain Patient and domain Doctor. 


domain Patient { domain Doctor { 
int sick; int not_busy; 
int lambda; int cure_time; 
int wait; int doctor_num; 
int Patient_num; domain Patient *patient_ptr; 
} Patient_Pe int patient_id; 
[sizeof_of_population]; } Doctor Pe[number_of_doctors]; 


The number of processors declared in each domain is determined 
by the number of patients and doctors involved in the simulation. The 
integer variables lambda in domain Patient and cure_time in domain 
Doctor represent the random times generated for patients becoming 
sick, and a doctors cure time, respectively. All random sick times for 
patients are generated in parallel, and the random cure times for the 
doctors are generated as each doctor is assigned to a patient. 


Cx 113 


At the start of the algorithm, all patients generate a random sick 
time. At each simulation step, the random sick times are decre- 
mented in parallel. When a sick time is decremented to zero, the pa- 
tient enters the doctors office to be cured. The code used to decre- 
ment the random sick times is, 


else if ((sick == NOT_SICK) && (lambda >0 )) { 
lambda -= 1; 
if(lambda == 0) sick = SICK; 

} 


The receptionist places all new sick patients (lambda = 0) into a 
wait queue and sets each patients wait field equal to their position in 
the queue. The receptionist then signifies that the patient has been 
placed in the queue by setting the patients sick flag to QUEUED. 


The C* algorithm allows multiple patients to become sick and re- 
quest service at the same time. If more than one patient becomes sick 
at the same time, they are ordered in the queue based on their relative 
processor number. The receptionist queues all of the sick people who 
entered the office at a given time before turning their attention to 
matching waiting patients to available doctors. 


All doctors are queried in parallel to determine if they are busy or 
available to take on new patients. Doctors who are currently busy, rep- 
resented by their busy flag = BUSY and their random cure time greater 
than zero, decrement their cure times in parallel. The number of 
available doctors is easily found using the reduction operation 


wait_doc = 0; 


wait doc += not_busy; 


within domain Doctor. wait_doc has to be cleared initially or the 
previous value of wait _doc will be added into the parallel reduction. 
If patients are in the sick queue and a doctor is available, then the 


114 D. L. Andrews and E. Barszcz 


next patient is assigned to the next available doctor with the lowest 
processor number. 


It is interesting to note that the previous step represents an artifi- 
cial {although realistic) bottleneck in the simulation. The syntax of C* 
would allow as many sick patients to be assigned to as many free doc- 
tors as are available, all in parallel. 


Keeping to the statement of the problem, the patient with the 
smallest wait time (representing their place in the sick queue) is se- 
lected and assigned to the next available doctor with the smallest rel- 
ative address. The selected doctor then generates a new random cure 
time and marks himself busy. The patient is selected by the following 
code 


[domain Patient} . { 
patient_num = (this - &Patient_Pe[0]); 
whosnext = PATIENTS; /** reset ***/ 
if(sick == QUEUED) { /** select next queued patient **/ 
whosnext <?= wait; /** next queue position **/ 
next_patient = PATIENTS; /** reset **/ 
if(wait == whosnext) /** select patient **/ 
next_patient <?= patient_num; 
/** patient cured **/ 
Patient_Pe[next_patient].sick = BEING _CURED; 
} } 


All doctors are queried in parallel to determine if they are finished 
treating a patient by the conditional if statement 


if(( cure_time == CURED) && (not_busy == BUSY)) 


within domain Doctor. All doctors who successfully pass the condi- 
tional above reset their patients sick flag to represent the patient is 
now not sick, and resets their own not_busy flag, performing both op- 
erations in parallel. 


c* 115 


The overall simulation was easily expressed in C*. Matching sick 
patients to doctors was performed using pointer indirection between 
the two domains. The worst case computational complexity of the al- 
gorithm is given by the for loop in domain Patient used to place 
patients in the queue, and is equal to the number patients to be 
queued at each simulation time step. The syntax easily allowed the 
two asynchronous events of patients becoming sick and doctors curing 
patients to be expressed cleanly, with each event taking place within a 
specified domain. Further, the simulation will scale well for varying 
numbers of patients and doctors. The only changes to the code would 
be the redeclaration of the size of the patient and doctor domains. 


5. Skyline Matrix Solver 


The objective of the Skyline Matrix Solver is to solve the linear sys- 
tem of equations, 


Ax = b 


without pivoting, where Ais an Nx Nskyline matrix and x and b are 
vectors. A skyline matrix has nonzero elements in row tin columns j 
through i where 1 <j <i, and also has nonzero elements in column j 
in rows ithrough j, where 1 <i</j. 


Let the vector I define the skyline below the diagonal and vector J 
define the skyline above the diagonal. The C* algorithm used is a vari- 
ation of parallel Gaussian elimination where I and J are used as masks 
to avoid unnecessary operations on zero elements. The input format of 
A is assumed to be a linear array of structures that contains only the 
nonzero elements and their i, j indices. 


Unlike C, C* does not currently support “ragged” arrays. 
Therefore, to avoid storing the zero elements, all nonzero elements 
are stored in a linear array. In dense parallel Gaussian elimination, the 
i,j array indices are required during computation to specify the rows 


116 D. L. Andrews and E. Barszcz 


Pivot 
Row 


Pivot 
Element 
Elements 
TUTTI to Update 
Row 
Pivots 


Figure 10 - Gaussian Elimination 


and columns for manipulation. The indices can be calculated repeat- 
edly each time they are required, or they may be calculated once at 
the beginning of the program and stored. 


For solving the skyline matrix it is easier and more computationally 
efficient to calculate the i, j indices for each nonzero element once 
and store them with the element. 


The forward elimination step of a dense parallel Gaussian elimina- 
tion is shown in Figure 10. The C* code to perform this dense parallel 
Gaussian elimination is shown below. 


/* Forward Elimination */ 
for (k=0; k<N; k++) { 
[domain Matrix]. { 
/* calculate i,j indices on the fly */ 
int offset = (int) (this - &A[0}[0]); 
int i = offset / N; 


int j = offset % N; 


C* 117 


/* calculate all row pivots in parallel by */ 
/* dividing each element in the column below */ 
/* the pivot by the pivot */ 
Af ((i > k) && (3 == k)) 
a =a / A(k)[(k}.a; 
/* update elements below the kth row by */ 
/* row(i) = row(i) - (row(k) * rowpivot(i)) */ 
if ((i > k) && (4 > k)) 
asa - (A[iJ{kl.a * A[k][jl.a); 
} 
{domain RHS]. { 
/* calculate i index on the fly */ 
int i= (int) (this - &B[0]); 
/* update all elements of the RHS */ 
/* below row k at the same time */ 
if (i > k) b=b - (A[il[k].a * B[k].b); 
} } 
/* Back Substitution */ 
for (k=(N-1); k>0; k--) { 
/* calculate kth element of the solution vector */ 
x(k] = B[k}.b / A[k] (k].a; 
{domain RHS]. { 
/* calculate i index on the fly */ 
int i = (int) (this - &B[{0)); 
/* update element of b above k element */ 
af (i < k) b=b - (Afi][k].a * x[k]); 
} } 
/* calculate last element of solution vector */ 


x[0] = B[0].b / A[0][0].a; 


The difference between the code given above and a more efficient so- 
lution for the skyline matrix problem is: 1) the saving of the row pivots 
and pivot row in temporaries and 2) selection of active elements. 


118 D. L. Andrews and E. Barszcz 


When the row pivots are calculated instead of saving them in A, they 
are stored in the skyline vector 1. Elements of the pivot row are 
stored in J for easy access. Skyline vectors I and J are used to mask 
unnecessary operations during the forward elimination step as shown 
below in the following code segment 


/* select nonzero elements where (i > k) && (j > k) */ 
if (((i > k) && (j > k)) && ((I[i].start <= k) && 
(J[{j].start <= k))) 


a=a- (Ifi].a * J[j].a); 


The i, j selection is done just as in the dense case, but with added 
conditionals from the I and J skyline vectors that limits arithmetic 
operations to the nonzero elements. Note that I[i}.a is the row pivot 
for row i, J[3].a is the j-th element of the pivot row, and the start 
elements define the skylines. This solution to the skyline matrix 
problem is easy to express in C* and has complexity O(N). All possible 
elements that can be updated in parallel are done in parallel. No zero 
elements are stored. Further, no operations are performed on zero 
elements. The only drawback is the storage requirements for each 
nonzero elements’ i, j indices. 


Footnotes 
1. C* is a trademark of Thinking Machines Corporation. 


2. Connection Machine is a registered trademark of Thinking 
Machines Corporation. 


3. VAX is a trademark of Digital Equipment Corporation. 
4. The definition of the keyword poly has been changed in recent 


revisions of the language. We use it here in its original context for 
illustrative purposes. 


c* 119 


References 

1. Hillis, W. D. The Connection Machine. MIT Press, Cambridge, MA, 
1985 

2. Knuth, D. The Art of Computer Programming, Vol. 1: Fundamental 
Algorithms. Addison-Wesley, Reading, MA, 1973 

3. Thinking Machines Co. C* Reference Manual. Version 4.3. 
Thinking Machines Corporation, Cambridge, MA, May 1988 

4. 


Turner, D. A. The semantic elegance of applicative languages. In 
Proceedings of the Conference on Functional Programming 
Languages and Computer Architecture, Portsmouth, NH, October 
1981, 85-92 


120 D. L. Andrews and E. Barszcz 


Appendix 


[RRR I IK IK IK II IR IOI RK IK II OK IK RK IK 


[** Hamming's Problem xx / 
[RIKI I RK KIKI KIO KIO TR IIIA I If 


#include <stdio.hs> 
void Batcher (int); 


#define MAXSIZE 150 
#define BIGSIZE 4096 


domain Primemult { 
int primenum; 
} Prime_Procs [MAXSIZE];; 


domain PrimeExpand { 
mono int indx, jndx; 
mono int frontprime; 
int product = 1; 
int Origin; 
int Location; 

} Prime_Exprocs [BIGSIZE]; 


main (argc, argv) 
int argc; 
char *argv{]; { 
int N; 
int maxoffset = 1; 
int maxpe = 0; 
int rep = 1; 
int prime; 
FILE *fp; 


fp = fopen("primelist.dat","r"); /** input file of primes **/ 


fscanf (fp, "d\n", &N) ; /** read in the value N 
while (fscanf(fp,"%d ",&prime)==1) { [wx main while loop 


printf ("%d\n", prime) ; 
[domain Primemult] . { 
mono int max = 0; 
mono int maxaddr = 0; 
mono int pwrtwo = 1; 
primenum = prime; 
int addr = (int) (this - &Prime_Procs[0]}); 


ax f 
x f 


while( (max >?= primenum) <= N ){ /** expand prime **/ 


if( addr >= pwrtwo) 
primenum *= Prime Procs[addr-pwrtwo] .primenum; 
pwrtwo *= 2; 
} /** end while ***/ 
/**k*x find the # of pe's <= N to be used later ***/ 
if (primenum <= N) 


Cc* 


maxaddr >?= addr; 
Prime _Procs[maxaddr + 1].primenum = 1; /*** set b**0 
maxaddr += 1; 
maxpe = maxaddr; 
maxoffset *= (maxaddr+1); 
} /** end of [domain.primemult] **/ 


[*xkKKKKKK perform multiplication of primes in domain primeexpand 


[domain PrimeExpand]. { 
int offset = (int) (this -&Prime_Exprocs[0]); 
int modaddr = offset % (maxoffset) ; 
/***k place the targeted number in the front end ***/ 
for(indx = 0; indx <= maxpe; indx++) { 
frontprime = Prime Procs[indx] .primenum; 


if( modaddr < (indx*rep + rep) && modaddr >= rep*indx) 


product *= frontprime; 
} /**** end for ek / 
rep = maxoffset; 
} /*** end domain PrimeExpand ***/ 
} /*** end of main while loop wee / 


fclose(fp); /*** close the file **/ 


/***k* Batcher sorts the values RHRRKKKRERE / 
Batcher (rep) ; /** batcher sort routine **/ 
/**** Select and print out the solution  ***kkkkkKKH/ 
[domain PrimeExpand] . { 
for(jndx = 0; jndx <= (maxoffset-1); jndx++) { 
if (Prime_Exprocs[jndx].product <= N) 
printf ("Prime _Exprocs[%d]= %d\n", jndx, 
Prime Exprocs[jndx].pro duct ); 
} /** end for **/ 
} /*** end domain PrimeExpand ***/ 


[RI IR III IO ROK III IOI I RRR ITO I IOI IORI ROR III 7 


[x** Paraffin Enumeration Problem aK] 
[RIK IK II KI RIT IO IIR TOTO KK III II BR RII IK / 


#include <stdio.hs> 
#include <math.hs> 


#define N 10 


domain Enum { /** mono ints temporaries used in reductions **/ 
mono int enumnum; 
mono int boolflag; 
mono int elem; 
LEX ARV pe OAT 


121 


aK / 


122 


int 
int 
int 
int 
}EnumPe 


domain 
mono 
int 
int 
[xx 
int 
int 
int 
int 


x; 
yr 
2; 


D. L. Andrews and E. Barszcz 


flag; /** flag = {0.1} used as boolean **/ 


[1024]; 


Paraffin { 

int pminaddr; 
pelem; 
expand = 1; 


{a,b,c } #e/: 


a; 
b; 
CG 
para[N]; 


}ParaPe [1024]; 


domain 


Paraxpand { 


/** mono ints all for 
mono int base = 0; 
mono int scalarbaseb; 
/** m = loop index for tranferring arrays from domain Paraffin **/ 


int 
int 
int 
int 
int 
int 
int 
int 
int 
int 
int 
int 
int 
int 


m; 

pe_num; 

rep; 

dsize; 

ent; 

ingth; 

carb num = 0; 
mod; 

baseb; 

thispe = 0; 
parray([N]; 
paraffin([N]; 
polymer [30]; 
carb_count [N}; 


[** 
[** 
[** 
[** 
[** 
[xk 
[** 
[** 
[** 


looping and temporary variables **/ 


pe_num is relative address **/ 

parallel counting variable **/ 

temp for copying arrays = Ingth of polymer **/ 
temp "for" index to copy arrays **/ 

current size of polymer array **/ 

number of carbons **/ 

pe's modulo address **/ 

temporary storage **/ 

* thispe = {0,1} used as boolean flag **/ 


domain Paraxpand *pfnptr; /** poly pointer to domain Paraffin **/ 
domain Paraffin *transptr; /** poly pointer to domain Paraxpand **/ 
}Pxpnd_Pe [1024]; 


main () 
int 


{ 


max_size; as 


index; 

this _rep; 
upper; 
total_expand; 
k; 
car_base[N]; 
exp(N]; 


max_size = number of possible permutations xe / 


front end temporary storage**/ 


listx(1024]; 


c* 123 


int listy[1024]; 
int listz[1024]; 
int ENum(1024]; 
int tempx; 


int tempy; 
int count = 0; /** index for front end array **/ 
int loop; 
for(index = 2; index <= N; indext++) { 
max_size = pow(index,3); /** max number of enumerations **/ 
[domain Enum]. { 
int mody; 


mono int next = 0; 
flag = 0; /** initialize all flags to zero **/ 
int offset = (int) (this - &EnumPe[0]); 
if( offset <= max_size) { /** enum for values <= maxsize **/ 
x = offset % index; /** mod address a **/ 
mody = offset / index; /** mod address b **/ 
y = mody % index; 
z = index-1-x-y; 
} /** end if **/ 
enumnum = 0; 


if((x + y) <= (index-1) && (x+y+z) == (index-1)) { 
enumnum += (poly) 1; 
next <?= offset; /** find first values **/ 
flag = 1; /** initialize PE flag **/ 
boolflag = 1; /** initialize global flag **/ 


while ((boolflag) == 1) { 
/** load front end with {x,y,z} **/ 
listx{count] = EnumPe [next] .x; 
listy[count] = EnumPe[next].y; 
listz[count] = EnumPe [next] .z; 


ENum[count] = index; /** enum number **/ 
for(elem=0; elem <= 2; elem++) { /** rotate values **/ 
if ((listx[count] == x) && (flag == 1)){ 
/** check all x's **/ 
if((listy[count] == y && listz[count] == z) || 
(listy[count] == z && listz[count] == y)) 


flag = 0; /** set PE flag **/ 
} /** end if x{count] = x **/ 


tempx = listx{count]; /** temp storage **/ 
tempy = listy[count]; /** temp storage **/ 
listy{count] = listz[count]; /** rotate z **/ 
listx[count] = tempy; /** rotate y **/ 
listz[count] = tempx; /** rotate x **/ 


} /** end loop 3 times **/ 
if(flag == 1) { 
next = 0; /** reset pointer **/ 
next >?= offset; /** choose next set of {x,y,z} **/ 
} 
count += 1; /** increment front end count **/ 


124 D. L. Andrews and E. Barszcz 


boolflag = 0; /** reset global flag **/ 


boolflag >?= flag; 
} /** end of while **/ 


} /** found # of combinations of xty+z =N-1 **/ 
} /** end domain Enum **/ 
} /** end for **/ 


[domain Paraffin].{ /** stretch out all iterations and update **/ 
int paraoffset = (this - &ParaPe[0]); /** offset **/ 
/** sequential reading in of values into domain Paraffin **/ 


for(loop = 0; loop < count; loop+t+) { 
ParaPe[loop].a = listx[loop]; 
ParaPe[loop].b = listy[loop]; 
ParaPe[loop].c = listz[loop]; 
ParaPe[loop].pelem = ENum[loop]; 
} /** end for read in values xx / 
/*** sum number of each value in parallel 
para[a] += 1; 
para[b] += 1; 
para[c] += 1; 
for(loop = 2; loop < N; looptt) { 
exp[loop] = 0; 
if (pelem == loop) exp[loop] += expand; 


KK / 


/** find # of pets **/ 


if( (pelem > loop) && para[loop] > 0) expand *= exp[loop]; 


}/** end of for plength **/ 
if (paraoffset < count) { 
total_expand = 0; /** reset count **/ 
total _expand += expand; 
} /** end if ***/ 
} /*** end domain Paraffin ***/ 


[domain Paraxpand] .{ 
pe_num = (int) (this - &Pxpnd_Pe[0]); 
for(index = 0; index < count; index++) { 
transptr = &ParaPe [index]; 


if(pe_num == 0) /*** select first elements pointer **/ 


upper = transptr -> expand; 


if ((pe_num < base + upper) && (pe_num >= base)) { 
rep = index; /*** mark the rep number ****/ 
carb_num = transptr -> pelem; /** element number **/ 


for(loop = 0; loop < N; loopt++) 


parray[loop] = transptr -> para[loop]; 


} /** end if **/ 


base += upper; /** update the base address **/ 


} /** end for**/ 
} /** end domain Paraxpand **/ 


/*x initialize first entry **/ 
{domain Paraxpand] . { 

ingth = 0; 

if(parray[0] != 0){ 


Cc* 


polymer[0] = parray[0]; 
ingth += 1; 
} /** initialized first group ***/ 
for(m = 0; m < parray[1]; m++) 
polymer [lngtht+] = 3; /** initialized second group **/ 
} /** end domain Paraxpand ***/ 


/*** recursion step to update paraffin molecule representations 
for(k = 2; k < N; k++) { /** loop through all values **/ 
{domain Paraxpand]. { 
pe_num = (int) (this - &Pxpnd_ Pe[0]); 
int pxd_enum; /*** enumeration integer for eachpe **/ 
if (carb _num == k) { 
car_base[k] = count; /** reset base **/ 
car_base[k] <?= pe_num; 
} /** end if **/ 
for(loop = k+1; loop < N; looptt) { 


if ((parray[k] != 0) && (carb_num == loop)) { 
baseb = count; /** reset to find base addr **/ 
baseb <?= pe_num; /** find base of pelem N baal A 
} /** end if **/ 


} /** end for **/ 
[RKKKK in paraliel kK KK / 


if ((parray[k] != 0) && (carb_num > k)) { 
pxd_enum = pe_ num - baseb; /** enumerate relative **/ 
mod = pxd_enum % exp[k]; /** offset into array **/ 


pfinptr = &Pxpnd_Pe{car_base[k]+mod]; 
/*** get personal copy of paraffin length **/ 
dsize = pfnptr -> Ingth; 
for(m=0; m < dsize ; mt+){ 
paraffin{m] = pfnptr -> polymer[m]; 
} /** end for (m=0; m < dsize, m+t+) **/ 
for(m = 0; m < parray{k]; m++) { 
for(cnt = 0; cnt < dsize ; cnt++) 
polymer[lngth + cnt] = paraffin[cnt]; 
Ingth += dsize; 
} /** end for **/ 
} /** end if **/ 
} /*** end domain paraxpand ***/ 
} /*** end for ***/ 


125 


xx 


/*** perform equivalence relation to eliminate redundant molecules 


[domain Paraxpand] . { 
int equiv = 1; /** flag for equivalence relation **/ 
mono int pecount = 0; 
mono int next; 
mono int equivflag; 
/*** in parallel, count # 1's, 2's, 3's RK KK / 
for (m=0; m < Ingth; mt+) { 
int carb_index = polymer[m]; 


carb _count(carb_index] += 1; /** count number of carbs **/ 


126 D. L. Andrews and E. Barszcz 


} /** end for m=0 ***/ 
/*** find redundant molecules in parallel ***/ 


next <?= pe_num; /** find first values xe / 

equiv = 0; /** reset all flags ak / 
thispe = 0; /** initialize PE flag **/ 

if (pe_num < total_expand) equiv = 1; /** out of bounds pe's **/ 
equivflag = 1; /** initialize global flag **/ 


while ((equivflag) == 1) { 
Pxpnd_Pe[next].thispe = 1; 
this_rep = Pxpnd_Pe [next] .rep; 
Pxpnd_Pe[next] .equiv = 0; /** set compare value to 0 **/ 
listx[pecount] = Pxpnd_Pe[next].carb count [1]; 
listy[pecount] = Pxpnd_Pe[next] .carb count [2]; 
listz{pecount] = Pxpnd_Pe[next].carb count [3]; 
/*** check all values in parallel ***/ 


if ((listx[pecount] == carb _count[1]) && (equiv == 1) && 
(this_rep != rep)) { /** check all molecules **/ 
if ((listy[{pecount] == carb _count[(2]) && 
(listz[pecount] == carb_count [3])) 


equiv = 0; /** set PE flag **/ 
} /** end of check **/ 
/*** find next carbon to check for redundancies **/ 
if (equiv == 1) { 


next = total_expand; /** yeset pointer **/ 
next <?= pe_num; /** choose next set of {x,y,z} **/ 
} 
pecount += 1; /** increment front end count **/ 


equivflag = 0; /** reset global flag **/ 
equivflag >?= equiv; 
} /** end of while **/ 
} /** end of domain paraxpand **/ 


[RRR KKK main print statement KKK RK IK KEK KK KK / 
[domain Paraxpand] . { 
for(loop = 0; loop < count; looptt) { 
printf ("\nPe[%d] carb_num = %d\n", loop, Pxpnd_Pe[loop] .carb_num) ; 
if (Pxpnd_Pe[loop].thispe == 1) { 
for (mscalar=0; mscalar < Pxpnd_Pe[loop].ingth; mscalart++) 
printf ("3d",Pxpnd_Pe[loop] .polymer([mscalar]); 
} /*** end if this == 1 ***/ 
} /** end for loop = 0 printf ***/ 
} /*** {domain Paraxpand] ***/ 
} /** end main **/ 


c* 127 


[RRR IK IK IK IK KR RK IK ITOK KK KOK IORI IK IOI TK KIO IK RK KKK ] 


/** Doctors Office Problem xx / 
[RRR RIKI KIRK IKK IK IKK KKK RIOR II III RIK KR IKK KK 


#define PATIENTS 20 
#define DOCTORS 5 

#define NOT_BUSY 1 

#define BUSY 0 

#define SICK 1 

#define NOT_SICK 0 

#define QUEUED ~-1 

#define TREATING 2 

#define CURED 0 

#define simulation_time 50 


#include <stdio.hs> 
#include <stdlib.hs> 


domain Patient { 
int sick = NOT_SICK; 
int lambda = 0; 
int wait; 
int patient_num; 
} Patient_Pe [PATIENTS] ; 


domain Doctor { 

int not_busy = NOT_BUSY; 

int cure_time = CURED; 

int doctor_num; 

int patient_id; 

domain Patient *patient_ptr; 
} Doctor _Pe [DOCTORS]; 


main() { 

int number_sick; /** number of sickies **/ 
int cnt = 0; 

int i; 

int next_patient; 

int whosnext; 

int wait_doc; 

int next_doc; 

int loop; 


/**** variables for printing out results ***/ 
int max_alloc = PATIENTS; 

int sick_count = 0; 

int Sick_List [100]; 

int Time_List [100]; 

int doc_finish; 

int num_cured; 

int cure_count = 0; 


128 D. L. Andrews and E. Barszcz 


int Doctor Cured[100]; 
int Patient_Cured[100]; 


for(loop = 0; loop < simulation_time; loopt+) { /** loop forever **/ 
[domain Patient]. { 
patient_num = (int) (this - &Patient_Pe[0]); 
/*** take care of generating random times ***/ 
if ((sick == NOT_SICK) && (lambda ==0)) { 
/** insert your favorite random number gen routine here **/ 
(void) srand(1); 
lambda = rand(); 
} 
else if((sick == NOT_SICK) && (lambda > 0)) { 
/** decrement all well counts in parallel **/ 
lambda -= 1; 
if (lambda == 0) sick = SICK; 
} 
/**x check for new people to be seen ***/ 
number sick = 0; /** reset **/ 
if (sick == SICK) number_sick += (poly)1; /** count # sick **/ 
/**x*x if new patient, put in queue wae / 
for(i=0; i < number sick; i++) { 
next_patient = PATIENTS; /** reset **/ 
if (sick == SICK) next_patient <?= patient_num; 
Patient_Pe[next_patient].sick = QUEUED; 
Patient_Pe[next_patient].wait = i + cnt; 
/*** gather stats on front end ***/ 
Sick_List [sick_count] = next_patient; 
Time_List [sick_count++] = loop; 


} 
/** update number of sick patients **/ 
ent += number sick; 

} 


/**** check the doctors *****/ 
/*** # doc's on coffee break **/ 
[domain Doctor]. { 
/*** update doctors cure times **/ 
if ((cure_time > 0) && (not_busy == BUSY)) cure_time -= 1; 
/*** update non-busy doc's ***/ 
wait_doc = 0; 
wait_doc += not_busy; 


} 


/*** check for queued patients if doctors available ***/ 
if((ent != 0) && (wait _doc != 0)) { 
/** select next doctor ***/ 
{domain Doctor] . { 
doctor_num = (int) (this - &Doctor Pe[0]); 
next_doc = DOCTORS; 
if (not_busy == NOT_BUSY) next_doc <?= doctor_num; 


c* 129 


} 
/** select next patient in queue ***/ 
[domain Patient]. { 
patient_num = (int) (this - &Patient_Pe[0]); 
whosnext = PATIENTS; ~ 
if (sick == QUEUED) { 
whosnext <?= wait; 
next_patient = PATIENTS; 
if(wait == whosnext) next_patient <?= patient_num; 
Patient_Pe[next_patient].sick = TREATING ; 
» } 
/*** assign patient to doctor **/ 
[domain Doctor]. { 
Doctor Pe[next_doc].not_busy = BUSY; 
Doctor Pe[next_doc].patient_id = next_patient; 
Doctor Pe(next_doc].cure_time = rand(); 
printf ("Doctor[%da] <- Patient[%d] at time[{%d]\n", 
next_doc,next_patient, loop); 
} 
/**** update # patients remaining in queue ***/ 
ent -= 1; 
} /** end if **/ 


/*** check for cured patients and update cure times ***/ 
[domain Doctor]. { 
/** find the number cured at this time **/ 
if ((cure_time == CURED) && (not_busy == BUSY)) { 
num_cured = 0; 
num_cured += (poly)1; 
} 
for(i=0; i < num_cured; i++) { 
if ((cure_time == CURED) && (not_busy == BUSY)) { 
doc_finish = DOCTORS; /** reset **/ 
doc_finish <?= doctor_num; 
Doctor _Pe[doc_finish) .patient_ptr = 
&Patient_Pe(Doctor_Pe[doc_finish] .patient_id]; 
Doctor_Pe[doc_finish] .patient_ptr -> sick = NOT_SICK; 
Doctor _Pe[doc_finish].not_busy = NOT_BUSY; 
Doctor_Cured[cure_count] = doc_finish; 
Patient_Cured[cure_count++] = 
Doctor_Pe[doc_finish] .patient_id; 
y } } } 


printf ("People in order of sickness\nPatient\t\tSick_time\n") ; 
for(i=0; i < sick_count; i++) 
print£(" $d td\t\t$d\n",i,Sick List [{i],Time List[il); 
printf ("Doctor <-> cured <-> Patient\n"); 
for(i=0; i < cure_count; i++) 
printf (" %d\t\t%d\n",Doctor Cured[i],Patient_Cured[i]); 
} /** end main **/ 


130 D. L. Andrews and E. Barszcz 


[RK KIT RR IIT RI II IR RT I I II RR RICK | 


/** Skyline Matrix Problem baal A 
[BRR RII KKK KKK RR RII IK KK IRR IKK KK KKK | 


#include <stdio.hs> 


#define N 4 
#define BIG 1024 


domain Matrix { float a; int i; int j; } AINJ [N]; 
domain RHS { float b; int i; } BIN]; 

domain Skyline { float s; int i; int j; } SKY[BIG]; 
domain Vector { float element; int start; } I[N], JIN]; 


void main(argce, argv) 
int argc; 
char *argv{]; 
{ int k, 1, m; 
float x[N], element; 
float rowsum; /* need because can't reduce to parallel lvalue */ 
int NonZeros; 
float Pivot; 
FILE *fp; 


fp = fopen(argv[1], "“r"); 


/* read the matrix */ 
for (k=0; k<N; k++) 
for (1=0; 1<N; 1++) { 
fscanf(fp, "sf", &element); 
A[k][1].a = element; 
} 


/* initialize */ 
[domain RHS].{ i = (int) (this - &B(0]); } 


{domain Matrix] .{ 
int offset = (int) (this - &A[0][0]); 
i offset / N; 
j = offset % N; 


} 


/* initialize values of the RHS */ 

for (k=0; k<N; k++) { 
rowsum = 0.0; 
{domain Matrix]).{ if (i == k) rowsum += a; } 
B[k].b = rowsum; 


} 


/* put the matrix into skyline form */ 
NonZeros = map(); 


c* 131 


printf ("NonZeros %d\n", NonZeros) ; 
/* forward elimination */ 
for (k=0; k<(N-1); k++) { 
{domain Skyline] .{ 
/* find the pivot element */ 
if ((i == k) && (4 == k)) Pivot ,= s; 
/* calculate the row pivots */ 
if (((i > k) && (j == k)) G& (I({i].start <= k)) { 
s = s / Pivot; 
I{i].element = s; 
} 
/* save of row k */ 
if(((i == k) && (j > k)) && (J(j].start <= k)) J[(j].element = s; 
/* update elements in parallel */ 
if (((1 > k) G& (3 > k)) && 
((I(i}] start <= k) && J{j].start <= k)) 
8s = s - (I[{i].element * J[4j].element) ; 
} 
{domain RHS].{ 
if (i > k) b= b - (I[i].element * B[k] .b); 
} } 


/* back substitution */ 
for (k=(N-1); k>0; k--) { 
[domain Skyline]. { 
if(((i <= k) && (3 == k)) G&& (J[k] start <= k)) J{i].element = s; 
} 
x(k] ,= B{k].b / J{k].element; 
{domain RHS].{ 
if ((i < k) && (J[k].start < k)) b 


b - (J[i].element * x[k]); 
 } 


{domain Skyline].{ if ((i == 0) && (j == 0)) Pivot ,= s; } 


x(0] = B[0].b / Pivot; 

/* print out solution vector */ 

printf ("x\n"); 

for (k=0; k<N; k++) printf("%f\n", x[k]); 
} /* end of main */ 


int map () 
{ int k, 1, m; 
[domain Vector].{ start = -1; } 
/* assign each nonzero entry to a position in Sky */ 
/* if can figure out a neato-keeno way to do this */ 
/* in parallel, go for it... */ 
m= 0; 
for (k=0; k<N; k++) 
for (1=0; 1<N; 1++) 
if (A[k][l}].a != 0.0) { 
Sky(m]).s = A[(k][1l].a; 


132 D. L. Andrews and E. Barszcz 


} 


A[k] (1] .4; 
Al[k] {1].3; 


Sky[m} .i 
Sky[m] .j 
mt+t+; 

if (I(k]).start == -1) I[k].start = 1; 
if (J{l].start == -1) J[1].start = 


i 
ton 


} 
return (m) ; 
/* end of map */ 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 133 


Haskell Solutions to the Language Session Problems 
at the 1988 Salishan High-Speed Computing Conference 


Paul Hudak 
Steve Anderson 
Yale University 
Department of Computer Science 
New Haven, CT 06520 


L Introduction 


Haskell is a new functional language, named after the logician 
Haskell B. Curry, that was designed by a 15-member international 
committee representative of the functional programming research 
community [7].1 The committee was formed because it was felt that 
research and application of modern functional languages was being 
hampered by the lack of a common language. The committee’s goals 
were that Haskell should: 


1. be suitable for teaching, research, and applications, in- 
cluding building large systems; 


2. be completely described via the publication of a formal 
syntax and semantics; 


3. be freely available, such that anyone is permitted to 
implement the language and distribute it to whomever 
they please; 


4. be based on ideas that enjoy a wide consensus; and 


5. be useable as a basis for further programming language 
research. 


134 P. Hudak and S. Anderson 


Haskell is a general purpose, purely functional programming lan- 
guage exhibiting many of the recent innovations in programming lan- 
guage research, including higher-order functions, non-strict functions 
and data structures, static polymorphic typing, user-definable alge- 
braic data types, pattern-matching, list comprehensions, a module sys- 
tem, and a rich set of primitive data types, including arbitrary and 
fixed precision integers, and complex, rational, and floating-point 
numbers. In addition it has several novel features that give it addi- 
tional expressiveness, including an elegant form of overloading using a 
notion of type classes, a flexible I/O system that unifies the two most 
popular functional I/O models, and an array datatype that allows purely 
functional, monolithic arrays to be constructed using “array compre- 
hensions.” 


The reader will note that we did not describe Haskell as a parallel 
programming language; indeed it is not. However, much research in 
recent years has centered on the implementation of functional lan- 
guages on parallel machines, including the building of special-purpose 
hardware such as dataflow and reduction machines. We will say little 
about these issues here, other than noting how the solutions to the 
problems presented have considerable inherent parallelism. 


Given our space constraints, it is impossible for us to describe 
Haskell in its entirety; our goal is only to give the reader some famil- 
iarity with the language by giving solutions to the four language session 
problems presented at the 1988 Salishan Conference on High-Speed 
Computing. The reader is referred to the Haskell Report [7] for a 
complete definition of the language. Of course, studying the solutions 
presented here will give the reader an idea of what programming in 
any of a number of modern functional languages is like; indeed, all of 
the solutions given have run on our implementation of Alfl, a functional 
language designed and implemented at Yale. 


Haskell 135 


2. Brief Overview of Haskell 


In this section we will describe enough Haskell syntax to allow un- 
derstanding the programs given later. As a result, there are significant 
parts of Haskell that will not be described at all, most notably user-de- 
fined data types, modules, and I/O. In addition, the syntax used in this 
paper corresponds to the design as it existed at the time of the con- 
ference; since then, some changes have been made. 


Haskell is an “equational” language similar to Miranda?, Hope, and 
several other modern functional languages. A function is defined by a 
set of equations which can pattern-match against their arguments. 
Lists are written [a,b,c] with [] being the empty list. An element a 
may be added to the front of the list as by writing a:as. Two lists may 
be appended together by 11++12. Here is an example of pattern- 
matching: 


member x [] = False 
‘ (y:ys) = if x==y then True 


else member x ys 


The “tick mark” on the second line is a convenient abbreviation for 
the initial subsequence on the preceding line (so that the arity of the 
two equations is the same). 


A function f x = x+1 may also be defined “anonymously” with the 
expression \x -> x+1, and thus (\x -> x+1) 2 returns 3. 


List comprehensions are a concise way to define lists, and are best 
explained by example: 


{ (x,y) | x<-xs, y<-ys ] 


which constructs the list of all pairs whose first element is from xs, 
and second is from ys. “Infinite lists” may also be defined, and thanks 
to lazy evaluation, only that portion of the list that is needed by some 


136 P. Hudak and S. Anderson 


other part of the program is actually computed. Thus the infinite list 
of ones can be defined by: 


ones = 1 : ones 


The notation [a..b] denotes the list of integers from a to b, inclusive, 
and [a..] is the infinite ascending list of integers beginning with a. 


There are many standard utility functions defined on lists. Aside 
from member defined earlier, the ones we need in this paper are the 
following: 


-- takes elements of list while pred is true 
takewhile pred {] = {] 
: (a:as) = if (pred a) then 


(a : takewhile pred as) 


else 
{] 
-- folds list from left 
foldl fa [] =a 
- (x:xs) = foldl f (f a x) xs 


-~ folds list from right 
foldr fa [] =a 


i (x:xs) = f x (foldr £ a xs) 


-- forms list of pairs from pair of lists 


zip [] bs = () 
: as () = [] 
: (a:as) (b:bs) = (a,b) : zip as bs 


-- removes duplicates from list 
nodups [] = [] 


' (x:xs) = x : nodups { y | y <- xs, y /= x ] 


Haskell 137 


In Haskell, function application always has higher precedence than 
any infix operation, and thus 


a : takewhile pred as 
is parsed as 
a: (takewhile pred as) 


Note in zip the use of tuples, which in Haskell are constructed in arbi- 
trary but finite length by writing 


(a,b, ..., c¢) 


(the parentheses are mandatory); tuples may be pattern-matched like 
lists. Finally, note that for foldl and foldr the following relationships 
hold: 


foldl f a [xl, x2, ..., xn] ==> (f ... (f (f a x1) x2) ... xn) 
foldr f a [xl, x2, ..., xn) ==> (f xl (£ x2 ... (f xn a) ... )) 


Haskell also has arrays and a special syntax for manipulating them. 
A two-dimensional array a is indexed at position (i, 4) via the expres- 
sion a! (i,j). New arrays are constructed using the primitive function 
array, which takes a set of bounds and a list comprehension as argu- 
ments; the list comprehension specifies the set of index/value pairs 
for the new array. For example: 


array ((1,1),(n,n)) 
{ ((i,3) , k¥at(i,j)) | i<-[1..n], j<-f1..n] ] 


returns an nx nmatrix representing the matrix a multiplied by the 
scalar k. 


This description of Haskell is quite brief, but should be enough to 
make the programs given later self-explanatory. Nevertheless, experi- 
ence with at least one other functional language would be beneficial. 


138 P. Hudak and S. Anderson 


3 Hamming’s Problem (Extended) 


A natural way to solve this problem in Haskell is to generate an in- 
finite increasing sequence of hamming numbers, and then filter out 
those less than n. But how do we create that infinite sequence? To 
start, let’s define a function scale that multiplies every element in a 
stream by a certain number: 


scale p xs = [ p*x | x<-xs ] 


Now note that a constructive way to express the problem is as an 
inductive definition: 


¢ lis in the output sequence. 


¢ For each prime p, if kis in the output sequence, then 
sois ke p. 


We can construct a dataflow diagram for this as shown in Figure 1, 
where the repeating pattern has been highlighted in a box. Capturing 
the box’s functionality in a function f, and using foldi to “unfold” £ 
over the list of primes, we arrive at this straightforward program to 
realize the dataflow diagram: 


hamming primes = 
h where h = 1 : foldl f [] primes 


f£ xs p = merge xs (scale p h) 


where merge merges a list of streams in increasing numeric order. 
Unfortunately, merge must also remove duplicates, since this simple 
definition will construct every permutation of the factors for a particu- 
lar number. For example, it will generate three twelves: 


2°2*3,2¢3 3, and3 °2 ¢2. 


This is of course inefficient, and we would prefer a solution that 
avoided the extra multiplications. 


Haskell 139 


Figure 1 - Naive Hamming Solution 


The problem stems from the fact that the sub-streams are gener- 
ated recursively form the entire list h. What we really want is some- 
thing that “chases its tail” so as to avoid generating all of the combina- 
tions. The dataflow diagram in Figure 2 in fact does just that—note 
how the result of each merge is fed back only to itself, thus avoiding 
the duplicates. As before we can express this result by abstracting the 
repeating functionality and using foldl: 


hamming primes = 1 : foldl £ [] primes 
where f£ xs p = h where 


h = merge (scale p (1i:h)) xs 


in which case merge is defined simply by: 


140 P,. Hudak and S. Anderson 


Figure 2 - Hamming Solution Without Duplicates 


merge (a:as) (b:bs) = if a < b then a : merge as (b:bs) 
else b : merge (a:as) bs 
: [] bs = bs 


f as [] = as 
and the result is just: 


takewhile (\x -> x <n) (hamming primes) 
using the utility takewhile defined in the introduction. 


Here is a sample output transcript, run on our Alfl implementation: 


Haskell 141 


takewhile (\x -> x < 46) (hamming [2,3,5]); 
Result: [1,2,3,4,5,6,8,9,10,12,15, 
16,18,20,24,25,27,30, 32, 36, 40, 45) 


4. The Paraffin Problem 


The problem is discussed in [8]. In that reference, it was solved 
using the functional language Miranda, which happens to be similar to 
Haskell. Thus our job is already done for us! Although, there are more 
efficient algorithms for solving this problem, they do not provide 
greater insight into understanding Haskell. Thus we will simply 
rewrite Turner’s KRC solution, with a few simplifications, in Haskell 
and refer to the paper referenced above for a detailed description of it: 


main = foldr (++) (}] (map paraffin [1..]) 
paraffin n = quotient equiv [ [x,"H","H","H"] | x <- para (n-1) ] 


para = ("H"] : map genpara [1..] 
genpara n = [ [a,b,c] | 
i <~ [0..(n-1)/3], 4 <~ [i..(n-1-i)/2], 


a <- para!!i, b <- para!!j, c <- para!! (n-1-i-j)] 


equiv a b = member (equivclass a) b 


equivclass x = closure_under_laws [invert, rotate, swap] [x] 


invert [[a,b,c],d,e,f] (a,b,c, [d,e, £]] 


' ("H" 2x) = "AW 


rotate [a,b,c,da] {(b,c,d,a] 


[b,a,c,d] 


swap {a,b,c,da] 


closure_under_laws fs xs = xs ++ closure’ fs xs xs 
closure’ fs xs ys = 


closure'' fs xs (nodups [a | f <- fs, 


142 P. Hudak and S. Anderson 


a <- map f ys, not (member a xs) )) 
closure'' fs xs [] = [] 


ys = ys ++ closure' fs (xs ++ ys) ys 


quotient f [] = (} 
' (a:x) =a: [_b | b <- quotient f x , not (f ab) } 


5. A Doctor's Office 


Of the four problems, this is probably the least well-defined. The 
main difficulty lies in just what is meant by the verb “model” in the 
first sentence. Perhaps the most common kind of modelling is a 
simulation of the actual time/event pairs, and that is what the first 
solution (written by Joe Fasel) presented below does. However, such a 
solution removes completely the non-determinism and asynchrony of 
the problem (since they are being simulated), which conflicts 
somewhat with the statement made in the last sentence of the 
problem description. Thus we also provide a solution that uses 
explicit non-determinism. Unfortunately, non-determinism is not 
part of the Haskell standard, and thus we assume a primitive operator 
called choose which non-deterministically chooses an element from a 
list. 


The two solutions are radically different, and reflect very different 
characteristics of Haskell. 


5.1. Time/event Simulation 


This model of the doctors’ office takes as input a number of pa- 
tients, a number of doctors, an initial list of times at which patients 
get sick, and two infinite lists of durations, representing the distribu- 
tions of times that patients remain well and of the times doctors take 
to cure patients. An infinite list of tuples is returned, containing the 
following information for each office visit: 


Haskell 143 


(patient, sick-time, doctor, start-treatment-time, cure-time) 


That is, a patient number, the time the patient got sick and entered 
the patient queue, the number of the doctor assigned, the time at 
which the patient was assigned a doctor, and the time the doctor fin- 
ished treating the patient. 


The style of this solution is to create mutually recursive streams of 
time/event pairs, merging them together at appropriate places while 
preserving the temporal order. The main streams of events are pa- 
tients (patientQ), doctors (doctorQ), and cured people (cured), as 
shown below. insert and makeQ are utilities for handling queues of 
time-event pairs. 


doctors nm initialWellDist WellDist CureDist = cured 


where insert y [1] = fyl -- insert y into time-ordered queue 
' (p',t') rvest@((p,t):xs) | t' < t = (p',t') : rest 


: = (p,t) : insert(p',t') xs 


makeQ (x:xs) yys -- initial queue (in order), 
-- subsequent entries (not in order) 


= x : makeQ (insert y xs) ys where y:ys = yys 


patientQ -- [(patient, sick-time)] 
= makeQ (foldr insert [] (zip [1..n] initialWellDist) ) 


{(p,ctx) | ((p,s,d,t,c),x) <- zip cured wellDist] 


doctorQ -- [(doctor, time-available) ] 
= makeQ [(d,0) | ad <- [1..m]] 
[(d,c) | (p,s,d,t,c) <- cured] 


cured 
= [(p,s,d,t,t+x) where t = max s a 


| ((p,s),(d,a),x) <= zip3 patientQ doctorQ cureDist] 


144 P. Hudak and S. Anderson 


5.2. Asynchronous Process Model 


In the following solution the “world” is modelled as a 6-tuple: 


{healthy people, -- list of healthy people 

sick_people, -- queue of sick people 

being_cured, -- list of sick-people/doctor pairs 
cured_people, -- queue of cured-people/doctor pairs 
doctor _q, -- queue of available doctors 

record] -- receptionist's record of pertinent data 


This representation is actually more detailed, and thus more realistic, 
than the previous one. In particular, note the presence of a record 
book as well as a queue to hold the doctor/patient pairs reporting back 
to the receptionist after a curing session (this queue is not called for 
in the specification, but seems more realistic). The initial state of the 
world should be obvious: 


initial_world = 


({{1..n], -- everybody is healthy 

(l, -- nobody is sick 

tl, -- nobody is being cured 

{l, -- nobody has just been cured 
[l..m], -- every doctor is idle 

[}] -- no record of curing 


The dynamics of this model are captured by three “processes” that 
operate non-deterministically (i.e., asynchronously) and in parallel. 
Each process takes as input a world and outputs a “new” world. 
Simulation of the doctors’ office proceeds by starting with the initial 
world and iteratively choosing a process non-deterministically with 
which to generate a new world on each step of the simulation. The 
result is an infinite stream of worlds. 


Haskell 145 


doctors world = choose_loop world processes 


processes = [{sickening_process, curing process, receptionist] 


sickening process w@([(],s,b,c,d,r) w ~~ everybody is sick!! 


’ w@(h, s,b,c,d,r) 


(hs,p:s,b,c,d,r) 


where (p,hs) = sicken_one h 


curing_process w@(h,s,{],c,d,r) =w -- nobody being cured 
’ w@(h,s,b, c,d,r) = (h,s,dps,dp:c,d,r) 


where (dp,dps) = cure_one b 


receptionist w = choose [help the_sick,move_the_cured] w 
where help the sick w@(h,[], b,c,d, r) =w -- nobody is sick 
' w@ (h,s, b,c, (], 4r) =w -- no free doctors 


' w@(h,p:ss,b,c,d:ds,r) = (h,ss, (d,p):b,c,ds,r) 


move _the cured w@(h,s,b,[l, ds,r) = w -- no recent curing 
Y w@ (h,s,b, (d,p) :dps,ds,r) = 
(p:h,s,b,dps,dst+(d], (d,p) :r) 


cure_one = choose _and_remove -- random curing function 
sicken_one = choose_and_remove -- random sickening function 
choose_and_remove lst = (el, [y | y <- lst, y \= el]) 


where el = choose lst 


choose_loop obj fs = new_obj : choose_loop new_obj fs 


where new_obj = choose fs obj 


Note that the non-deterministic utility functions are built from a single 
non-deterministic primitive called choose that non-deterministically 
selects an element from a list. 


146 P. Hudak and S. Anderson 


This non-deterministic process model, by the way, could be made 
deterministic by providing lists of sickness and wellness distributions 
as in the time/event simulation. Similarly, the time/event simulation 
could be made non-deterministic by suitably merging the event 
streams non-deterministically. 


6 Skyline Matrix Solver 


Our understanding of this problem was aided greatly by [3] and the 
Fortran code written by Andy Sherman which implements an envelope 
method for solving a linear system. That code, complete with docu- 
mentation, is listed in the Appendix of [4]. 


Having Sherman’s code provided us with an opportunity to study 
Fortran-style incremental array manipulations in a functional language, 
and to contrast that with the preferred monolithic array approach. We 
think the results are quite interesting. To conduct the study we first 
converted, as faithfully as possible, the Fortran code into Haskell using 
incremental updates to purely functional arrays (see [6] for a discus- 
sion of incremental arrays). We then rewrote the program in a mono- 
lithic style, adhering more closely to the matrix algebra, but using the 
same envelope representations used by Sherman. 


The incremental functional array solution is presented in [4], and 
illustrates how one could do incremental array operations in a func- 
tional language that “have the feel” of side effects to arrays in an im- 
perative language. In fact, the incremental program, when run on our 
Alpha-Tau implementation of Alfl [5], achieves the same space com- 
plexity as the Fortran program. Our optimizer is able to infer that ev- 
ery array is “single-threaded” and thus updates can be done distruc- 
tively rather than by copying. 


On the other hand, this is not the preferred way to program with 
arrays in a functional language. Haskell has a primitive data type for 
arrays together with special syntax that allows the specification of an 


Haskell 147 


array instance monolithically rather than incrementally. That is, the 
entire final array is specified in one monolithic declaration, yielding a 
declarative reading more in line with the philosophy of functional pro- 
gramming. This style of solution is presented below. 


6.1. Monolithic Array Solution 


The skyline problem illustrates well some of the special strengths 
of Haskell arrays. In particular, the array specifications can be derived 
from the original mathematical definition of the problem in a clear and 
straightforward way. The essential data dependences are clear, rather 
than obscured by extraneous operational sequencing. The recursive 
definition of arrays, including mutually recursive definitions of multi- 
ple arrays, permit elegant specifications as well as efficient implemen- 
tations. Haskell arrays permit separate definitions for elements in 
different regions of any array, which permits optimizations similar to 
the lifting of computations from Fortran loops, and which clearly cor- 
respond to the mathematical function domain specifications. 


The incremental solution was essentially a transcribed version of 
Sherman’s code, and thus we included no description of the data rep- 
resentations or the algorithm. For the monolithic solution we will in- 
stead start from the very basic, and develop the final program via step- 
wise refinement of the specification. 


6.1.1 Introduction to Sherman’s envelope format for sparse matrices 


Sherman’s envelope format works best when the sparse linear sys- 
tem A * x= bhas its equations and variables ordered such that most of 
A’s nonzeros are close to the main diagonal. Each row iof the lower 
triangle is stored as an envelope from the leftmost nonzero in the row 
up to the last column j =i- 1 before the diagonal. Likewise, each col- 
umn j of the upper triangle is stored as an envelope from the upper- 
most nonzero in the column down to the last row i= - 1 before the 
diagonal. The main diagonal itself is stored as a 1-D vector of length n. 


148 P. Hudak and S. Anderson 


Sherman represents a sparse matrix as the 5-tuple (n,pl,d,pu, 
irl,iru) where: 


¢ n =the order of A. 


* pl,d, pu = 1-D floating point vectors representing the 
lower triangle’s consecutively stored row envelopes, the 
main diagonal elements, and the upper triangle’s con- 
secutively stored column envelopes. 


¢ irl, iru = 1-D length ninteger vectors of base ad- 


dresses into pl and pu. 


The base address vectors require some explanation. For access into 
lower triangle p1, suppose we defined the vectors: 


® £1!1i =the column index of the first nonzero in row it 


* begin_1!i = the index into p1 of row i’s first nonzero. 
Then we would access a! (i,j) in the lower triangle by 

pl! (begin _l!i + j - flti) = a! (i,4) 
But the value 

begin_l!i - £1fi 


is the same for every jin row i. In a later section we will see that 
computing an element (i, /) of either the lower or upper triangle factor 
requires an inner product summation that runs along the lower trian- 
gle’s row iand the upper triangle’s column j. For a sequential 
program it is desirable to make this summation the innermost loop to 
preserve locality of reference and therefore achieve good cache and 
virtual memory hit rates. Therefore we would like to raise this loop- 
invariant computation out of the innermost loop, replacing the O(n?) 
evaluations of the expression 


begin_l!i - f1!i 


Haskell 149 


by O(n) evaluations. We also save space in the representation by re- 
placing the two length-n vectors with a single length-n vector. 


irlti = begin _l!i - flti 


pl! (irl!i + 3) = a! (i,j) 


The value of ir1!i can be thought of as the row ienvelope’s base ad- 
dress in pl. The “first nonzero” function f1 is useful as a limit for the 
summation over all jin row i, but can be easily recovered from irl. 
The upper triangle’s column-oriented envelopes are stored in a 
similar fashion. 


6.1.2, The “first nonzero” functions 


We will show how the first nonzero function £1 for the row-ori- 
ented envelopes in the lower triangle can be recovered from the 
vector irl. A similar function fu can be derived for the column- 
oriented upper triangle envelopes. 


The last column stored for row iis j=i- 1. Let (pl_env_len i) 
equal row ienvelope size. Then the column index of row i's first 
nonzero is 


fl i-=i - (pl_env_len i) 


The index values (irl!i-1 + i-2) and (irl!i + i-1) into pl point to 
the end of the row i - 1 and row ienvelopes respectively. Since the 
envelopes are stored consecutively in p1, we have 


pl_env_len i = (irl!i + i-1) - (irl! (i-1) + i-2) 
therefore, 


fli =i- 1 + irlti-l - irl!i 


150 P. Hudak and S. Anderson 


Since there is no irl!0 entry, f1 is only defined for i <- [2..n]. 
The row 1 envelope is always empty in the lower triangle. For an 
empty row the first nonzero is in column 


(f1 4) =i 
so the envelope contains columns 


j <- [(£1 i)..(i-1)] = 6 


We could put a conditional in £1 to make it defined for row 1, but 
this imposes a runtime test for every row. A better alternative is to 
define a bogus irl!0 that causes (f11) to return 1. Entry irl!1 
always has the value 


irl!1 = (begin_1 1) - (f11) =1-1-=0 
therefore ir1!0 must satisfy the equation: 
1=fl1=1-1 +4 irl!0 - irl!1 = irl!o 


However, we will discover later that we can always avoid any calls to f1 
for row 1, or to fu for column 1. 


6.1.3. A functional derivation of L * U factorization 


The problem is to solve the linear system A * x = b: given A and b, 
what is x? If Ais invertible, there exists a unique factorization A= Le 
U where Lis lower triangular and Uis unit upper triangular, which 
reduces the original problem to the easier problem of solving the tri- 
angular linear systems L* y= b, U* x= y. 


But this leaves the problem: given A, what are L and U? The usual 
derivation of Land Uis presented as a sequence of steps ke [l..n], 
each step forming an intermediate matrix A(k); this particular se- 
quential approach to Gaussian elimination is very obscure, hiding the 
essential data dependences under non-essential operational details. 


Haskell 151 


Instead we will first write out the equation A = Le U as if we were 
finding A given L and U, then by algebraic manipulation, derive mutu- 
ally recursive equations for Land U given A. We will see that the 
Haskell program mimics closely, the mathematical notation we use to 
derive the equations for Land U. Because of this close resemblance, 
the program is easy to reason about, the essential data dependences 
are clear, and it is easy to justify and debug optimizations. 


Each afi, j) is the inner product of [s row iand u’s column j: 


nm 
aij) = Yui k)eulk, jf), te [lon je [1.n] 
k=l 


But there is no contribution to a(i, j) for terms in which li, k) = 0 
(columns k to the right of the diagonal: i< k). or in which u(k, jf) = 0 
{for rows k below the diagonal: j < k). Therefore, instead of summing 
over k € [1..n], we only need to sum over ke [1..(min ij)]. 


Equivalently, we can separate the definitions for afi, j) in the lower 
triangle and diagonal (i<j, and therefore use jas the summation 
limit): 


J 
ali, j) ¥ UG, k) eulk, J) 
k=1 


ped 
Ui fj) eu. sf) + YU, k) sulk, J), 
k=1 
re ia Se hal 


or in the upper triangle (i<j, and therefore use i as the summation 
limit). 


ai, j) 


t 
Yl, sulk, p 
k=1 


t-] 
li, ) euli, jf) + SU, k) eulk, j/), 
k=1 


ie [l..n],je [itl..n] 


152 P. Hudak and S. Anderson 


But we can immediately rearrange these equations to define the 
elements of L and U {recall that we require u(j, /) = 1.0 for all j): 


l(t, j) 


j-l 
ali, j) - ¥ Uli, k) © ulk, §) ie [1..n], je [1..i] 
k=1 


t- 1] 
uli, j) = (ali, j)- >¥ Ui, k) eulk, f)) / Ui OD, 
k=! 
ie [l..n], jf € [it]1..n] 


The L equations show us that whatever the operational sequencing, 
(i, j) depends on afi, j) and recursively depends on other L elements 
in the same row i and to the left, and on Uelements in the same col- 
umn j and above. The recursion terminates upon the leftmost column 
of Land the topmost row of U. Similar reasoning holds for the U 
equation. 


In the following we will replace li, ) with the name d(i). There 
are optimizations we can perform on L’s diagonal elements that make 
them deserve special treatment. The d(i)’s are not to be confused 
with the elements of diagonal matrix Din the L * De U factorization, 
where both Land Uare unit triangular. 


6.1.4, L* U factorization in dense array format 


For clarity, we introduce some new syntax into list comprehen- 
sions. In an array function's list comprehension argument, the (i, x) 
pair can be written in the form i= x, similar to the declarations in a 
Haskell where. For example, 


array ((1,1), (N,N)) 
[ (i,j) =k * al (i,j) | i <- [1..N], 3 <- [1..N] ] 


From the equations in the previous section let us write a functional 
program to compute Land U. Let us define a higher-order function 
for the mathematical summation sign: 


Haskell 153 


sum i j accum f = if j < i then accum 


else sum (i + i) j (accum + (f i)) £ 


Let us define a function 1 that fiven index (i,j) computes the element 
value in L: 


1 (i,3) = a!f(i,j) - (sum 1 (3-1) 0 1_exp) 
where 1 exp k = L! (i,k) * U! (k, 4) 


This definition of 1 is very similar to a Do loop in Fortran, and can be 
compiled as efficiently. 


Instead we will use an alternative definition of sum that operates 
over a list. 


sum xs = suml 0 xs 
where suml accum [] = accum 


suml accum (x:xs) = suml (accum + x) xs 


Used with a list comprehension argument, this version gives a some- 
what more legible way of writing 1 i 3, and more closely resembles the 
mathematical summation sign. 


1 (4,3) = al (i,j) - sum [ L!(i,k) * Ul(k,3) | k <- [1..j-1] ] 


We can think of the list as a multiset, and of sum as summing the ele- 
ments of the set (although strictly speaking, floating point addition is 
not associative). Techniques such as Wadler’s listlessness and defor- 
estation transformations can ensure that such an expression gets con- 
verted to a semantically equivalent expression in which the lists are 
eliminated; the expression can be compiled as efficiently as a Do loop 
[9,10]. 


This definition of 1 holds for 


i <- [2..nJ, j <- [1..i-1] 


154 P. Hudak and S. Anderson 


Row 1 is skipped since 1(1,1) is on the diagonal and we wish to de- 
fine the diagonal elements separately. 


The definition for ad, which computes L’s diagonal elements, is a 
simplified version of the definition for 1 (since i=), for i <- [1..n]. 


The definition for Uis nearly the same as for L except that summa- 
tion stops at k=i- 1. Then the entire row is scaled by 1./d!i to 
normalize U’s diagonal to 1.0. U’s definition holds for 


i <- [l..n-1), j <- [itl..n] 
or equivalently, for 
jo<- [2..n], i <- [(1..5-1) 
See the complete program at the end of this section. 


Each element d!i appears as a divisor in the definition for every 
u! (i,j) in the same row i ((n2 - n) / 2 divisions altogether), as well as 
in definition of x!i in the same row for the L* U* x= b backsolve 
stage (n divisions). Since division is expensive compared with multi- 
plication, we instead store the inverse of each d!i replacing O(n?) 
divisions with n divisions and O(n?) multiplications. This is a classic 
example of using an array to store expensive shared computations. 


The definitions of Land U are mutually recursive, both in the math- 
ematical definition and in the Haskell array definition. We do not need 
to store L’s upper or U's lower triangle, which are zero, or U’s unit di- 
agonal, so we can store all the essential results in a single n? array. We 
can recursively define the matrix 


lu = (L- D) + Do} +(U-TD) 


in dense matrix format (where D = L’s diagonal, D'! = D's inverse): 


Haskell 155 


plu a = lu 
where 
((1,1), (n,n)) = bounds a 


lu = array ((1,1), (n,n) 
{ (i,j) = la jg 1 a <- (2..n], 3 <- [1..4-1) ] +4 
{ (3,2) = dai | i <~ [1..n] ] ++ 
{ (i,j) =uijt ai <- (1..n-1]), 3 <- [itl..n] ] 
1 i 4 = a!(i,j) - sum (lu! (i,k) * lu! (k,j) | k <- [1..5-1]] 
di =1./s 
where s = a!(i,i) ~- sum [lu! (i,k) * lu! (k,i) | k <- [(1..i-1]] 
ui j = lu! (i,i) * 


( a! (i,j) - sum [lu! (i,k) * lu!(k,j) | k <- [1..4-1]] ) 


6.1.5. Lazy arrays and strict arrays 


Notice that the domain specifications in the array comprehension 
correspond exactly to the domain specifications given in the mathe- 
matical function definitions. These domain specifications should not 
be thought of as looping constructs: they say nothing about the order 
in which elements of the array lu should be evaluated. 


In fact, we could let lube a lazy array. Each element in a lazy array 
is represented by a thunk which is evaluated only when demanded. 
Domain specifications such as 


i <- [2..n], 3 <- [1..4-1] 


specify where thunks for a particular form of expression must be 
placed, but say nothing about the order in which the thunks are evalu- 
ated. 


Evaluation of an element in a lazy array is forced only when it is 
explicitly requested. If a lazy array is recursively defined, evaluation of 
an element in turn forces the evaluation of other elements on which it 
has a data dependence. 


156 P. Hudak and S. Anderson 


But if an element has already been forced once, the thunk modifies 
itself so that its value is returned immediately, without recomputation. 
We can think of an array a as a function of d integer arguments (where 
dis the array’s dimension), for which we know that any given function 
application (a iy ... ig) (i.e., any given array element a!(i,,...,ig)) will 
be requested many times. In this view an array is a caching function. 


There are several essential differences between lazy arrays in 
Haskell and arrays in a language like Fortran. Haskell specifies the re- 
sult array monolithically in terms of a definition for each element, 
whereas Fortran specifies the result array in terms of incremental up- 
dates to the input array. For the example program presented so far, 
Haskell’s monolithic definition requires that the output array be com- 
puted in a separate space from the input array. 


For Haskell to be able to reuse the input array a to store the output 
array lu, the compiler must know that reuse is safe. There must be no 
other outstanding references to a outside the definition of lu. Fur- 
thermcre, an element a! (i,j) must be dead at the time that it is 
replaced by element 1u! (i,j), which means that either the compiler 
must determine or the programmer must specify a safe order of evalu- 
ation. This research topic is the subject of [1] and [2]. 


Another difference is that the Fortran programmer must be careful 
to arrange the order of his computation so that whenever he evaluates 
an element lu! (i,3), the elements on which lu! (i,j) has a direct 
data dependence will have already been computed. Both the Haskell 
and the Fortran arrays can be viewed as cached functions, so although 
they may differ in the order in which array elements are evaluated, 
there is no difference in the total amount of computation time spent 
on array indexing and floating point arithmetic. But Haskell’s thunks 
increase the time by a small constant factor. In addition to computing 
the element values, we must also create a thunk for each element 
when array storage is allocated; and whenever an element is de- 
manded we must test whether or not its thunk has been forced yet. 


Haskell 157 


Notice that we could eliminate the need for creating and testing ele- 
ment thunks if, like the Fortran programmer, we could guarantee a 
safe order of evaluation. See [1,2]. 


6.1.6. Refinement of L « U factorization using “first nonzero” 
information 


Notice that in the summations, the k-th term 
Liqi,k) * ul (k,3) 


makes no contribution if either factor is zero. If a sparse matrix is or- 
ganized such that most nonzeros are close to the main diagonal, con- 
siderable work can be saved by ignoring terms in which either 
1! (i,k) falls to the left of row i’s first nonzero or if u! (k, 3) falls above 
column /'s first nonzero. In other words, skip terms for which 


k < (f1 i) OF k < (fu 4) 


Assume we are given the “first nonzero” functions f1 and fu. Lis 
then defined by: 


1 (i,j) = at (i,j) - sum [lu! (i,k) * lu! (k,3) | k <- [kmin_1..j-1]] 
where kmin_1 = max (fl i) (fu 4) 


We have defined 1 (i, 4) for 
i <- [2..n), 3 <- (2..(4-1)] 


But recall the domains of the first nonzero function: (£1 i) is defined 
over i <~- [2..n], which causes no problem, but (fu 3) is defined only 
over j <- (2..n], which makes (1 (i, 4)) undefined in column one. 


We can put a run-time test in (fu j) for the case column j = 1, but 
then this test gets executed for every one of the O(n?) lower triangle 
elements. But we notice that whenever j is at or to the left of the 


158 P, Hudak and S. Anderson 


row’s first nonzero (i.e., when j <= (f1 i)), the summation must ter- 
minate immediately. Then we are left with 


(lL (4,5)) = a! (i,j) 
which is zero for j < (f1 i), nonzero for j = (f1 i). 


Therefore, when we know j <= (f1 i), we can return a! (i, 4) im- 
mediately, avoiding calls to the summation altogether. This case in- 
cludes column j = 1, so we do not need special treatment for this col- 
umn, 


We partition the lower triangle into different regions which use 
separately tailored element definitions. 


lu = array ((1,1), (n,n)) 
[ (i,j) = li jg [ ai <- (2..n], 3 <- ((£1 4)41..1-1) ] ++ 
( (4,3) = afi,j] | i <- (2..n], 3 <- {(£1 i)], j < i} ++ 
[ (4,5) = 0.0 | i <- [2..n], 3 <- [1..(£1 i)-1) ] 


Similar partitionings hold for the diagonal and upper triangle. These 
clauses partition the lower triangle into three regions: 


1. 3 <- {(f11i)+1..i-1] : inside the envelope except for 
the envelope’s first column (the empty set if first 
nonzero is on or immediately beside the diagonal), 


2. 3 <- [(£11)]1, 3 < i: the envelope’s first column (the 
empty set if first nonzero is on the diagonal). 


3. j <- [1..(f£1 i)-1] : outside the envelope (the empty 
set if first nonzero is in column one). These zero ele- 
ments of a are guaranteed not to fill in for lu. 


Haskell 159 


6.1.7. L © U factorization using envelope representation 


Finally let us convert our definitions from dense format to envelope 
format. Let input matrix a be represented by the tuple 


(n,old_pl,old_d,old_pu, irl,iru) 
and result matrix 1u be represented by 
(n, new_po,new_d,new_pu, irl, iru) 


The auxiliary vectors iru and irl are the same for both input ma- 
trix a and output matrix lu. Zero elements in a that become non-zero 
in lu are called fill-in. Fill-in can occur only inside the envelope; all 
zeroes outside a’s envelope are guaranteed to remain zero in lu and 
are therefore also outside 1u’s envelope; therefore a and 1u have the 
same envelope structure, as represented by vectors iru and irl 


The vector bounds for new_d are simply (1,n). We can get the new 
vector bounds for new_p1 and similarly for new_pu by fetching the old 
vector bounds: (bounds pi). Alternatively, we can observe that the in- 
dex of p1’s last element is 


irlin+#+n-1 
Example substitutions for references to the input matrix a are: 


When i <- (2..n], 3 <- [(£1 i)..(i-1)], a!(i,3) becomes 
old_pl!(irlti + 4) 


When i <- [1..n], a!(i,i) becomes old_d! (i) 


When 3 <- (2..n], i<- [((fu j)..(j-1)], a! (i,j) becomes 
old_pu! (i + ixru!}) 


Similar substitutions can be made for references to 1, d, and u re- 
gions of lu! (i, 3). The definition of the lower triangle then becomes: 


160 P. Hudak and S. Anderson 


1 (i,j) = s 
where 
kmin_1 = max (f1 i) (fu 3) 
l_exp k = new pl! (irl!i + k) * new_pu!(k + iru!}) 


accum_init = old pl! (irl!ti + 3) 


8 = a_init - sum {l_exp k | k <- [kmin_l..j-1)] 


new_pl = array (bounds old_pl) 
{( irl!itj = 1 (i,3) 
| i <- [2..n], 3 <- ((£1 i)+i..i-1] J] ++ 
{ irl!i + j = old_pl!(irl!i + 34) 
| i <- (2..n], j <- ([(fl i)], 4 < iJ 


There are many opportunities for common subexpression elimina- 
tion. For a given element (1 (i,3)), every term kin the summation 
uses the same base address values irl!i and iru!4, we could save two 
vector lookups per term. 


Also notice that for a given row i, the expressions irl!i and (f1 i) 
appear both in the definition of (1 (i,3)) and in the array compre- 
hension for new_pl. By abstracting these two expressions out of the 
definition of (1 (i, 4)) and computing them at the level of the array 
comprehension, we not only share between these two parts of the 
program, we also ensure that these expressions are computed only 
once for a given row, instead of getting recomputed for each element 
in the row. 


Here is our final complete version of the Le U factorization, giving 
the code for the lower triangle. The code for the main diagonal and 
the upper triangle are similar. 


Haskell 161 


plu (n, old_pl, old_d, old_pu, irl, iru) = 
{(n, new_pl, new_d, new_pu, irl, iru) 
where 
1 (i,j) ivli fli=s 
where 
kmin_l1 = max fli (fu j) 
iruj = iru!j 
l_exp k = new_pl!(irli + k) * new_pu! (k + iru}) 
a_init = old_pl!(irli + 3) 


8 a_init - sum [l_exp k | k <- [(kmin_1..j-1]] 


new_pl = array (bounds old_pl) 
[ (ixli + 3) = 1 (1,3) irli fli 
{ i <- (2..n], 
irli <- [ irl!ti ], 
fli <- [ (f1 4) 1, 
53 <- { f1i41..i-1 ] ] ++ 
{ (irli + 3) = old_pl!(irli + 3) 
| i <- [2..n], 
irli <- [ irl!i], 
fli <- [ (£1 i) 1, 
j <- [ fli], fli <i] j 


An extension of the list comprehension that treats the nesting of 
generators and loop-invariant subexpressions more clearly and ele- 
gantly is discussed in [1,2]. 


If the compiler treats i as an outer loop index and j as an inner 
loop index, we have achieved the equivalent of lifting loop-invariant 
computations in Fortran. There remain a few more opportunities for 
lifting common subexpressions in this program fragment, but we have 
taken all opportunities that lift loop-invariant subexpressions. 


6.1.8. The L*U-*x = b solution phase 


We will discuss two versions of the backsolve phase, one version 
using the column-oriented U, the other using the reorganized row- 
oriented U. 


162 P. Hudak and S. Anderson 


Aex=LeUex=b is equivalent to solving the lower triangular 
system L* y=b, using the intermediate solution y as the righthand 
side for solving the upper triangular system Ue x= y. 


1! (1,1) * (y 1) = bil 
1!(2,1) * (y 1) + 1!(2,2) * (y 2) = b!2 
l!(n,1) * (y 1) + 2... + 1l!(n,n) * (y,n) = bin 


Recall that we are storing the inverse of diagonal elements under the 
name 


@ti=i. / 1!(i,i) 


for every i. For a typical row i, this system of equations can be recast 
as the function: 


y_vec = array (1,n) (i=yi |i <- [1..n) ] 
yi =s * dti 
where 


s = b!i - sum [ 1! (i,j) * y_vec!j | j <- [1..i-1] ] 


Finally we solve the upper triangular system Ue x= y, recalling 
again that Uis unit diagonal. The entire function, assuming the matrix 


lu= (L- D) + Di + (U-I) 


is in dense matrix format, and doing the appropriate substitutions for 
l!i(i,4), dti, and u! (i, 4): 


plub lu b = x_vec 


where 
yi =s * lu!(i,i) where 
s = b!i - sum [lu! (i,j) * y_vec!j [| j <- [1..i-1]] 
xi = y vec!i - sum (lu! (i,j) * x_vec!j | j <- [itl..n]] 
y_vec = array (1l,n) [i=yi |i <- [l..n] ] 


x_vec = array (1,n) [i=x i] i <- [l..n) ] 


Haskell 163 


Now let us stay with dense format, but incorporate the “first 
nonzero” functions £1 and fu to avoid multiplies and subtracts for 
terms in which the contribution weight 1! (i,j) oru!(i,4) is outside 
the matrix envelope and therefore zero. The change is trivial for the 
lower triangular system, since the summation is along a row. For 


i <- [2..n), (f£1 i) 
tells us the first nonzero column in row i: 


yi-=s * lu!(i,i) where 


s = b!i - sum [lu! (i,j) * y_vec!j | j <- [(f1 i)..4i-1] 
Otherwise, y 1 =b!1. 


But for the upper triangle we have the problem that the summation 
is also running along a row, but f1 tells us the first nonzero in a given 
column. We cannot use it to give a bound on the summation for a row i 
the way we did for the lower triangular system. We could instead use 
f1 as a predicate for each element of a row to see whether that ele- 
ment falls outside the upper triangle’s column-oriented envelope. 


xi = y vec!i - sum [ lu!i(i,j) * x_vec!j 
| j <- (itl..nl, (fu j) <= i) 


x vec = array (1,n) [i=xi | i <- [1..n] ] 


Unfortunately, although we avoid an expensive floating point multiply 
and subtract for each zero u! (i, 4), we still incur a predicate test for 
every element. The upper triangular envelope may be of size O(n), but 
testing every element for inclusion in the envelope forces us to per- 
form O(n?) work. 


One approach at this stage is to switch to a column-oriented view of 
U, which was convenient for the factorization phase, to a row-oriented 
view more appropriate to the backsolve phase. [4] gives a program 
that performs this column-oriented to row-oriented reorganization of 
U’s envelope representation. The reorganization takes time propor- 


164 P. Hudak and S. Anderson 


tional to the size of the upper triangle’s row-oriented envelope. For 
matrices in which the maximum size of any row envelope is indepen- 
dent of matrix size n, this time is O(n). 


Another solution is to imitate the Fortran solution, which walks 
through Ucolumn by column, performing successive updates on the x 
vector as each x!j becomes available. Keeping for now the dense rep- 
resentation of lu, we can transform the definition of xvec given above 
to this form: 


x_vec = j_loop n y_vec 
j_loop j x_vec = 
if 3 < 1 then 
x_vec 
else 
j_loop (j - 1) 
(array (1,n) 
{i = xvecti | i <- [1..fuj-1] ] ++ 
{ i = x_vec!i - lu(i,j) * x_vec!j | i <- [fuj..j-1]) J ++ 
[ i = x_vecti | i <- [j..n] ]) 
where 


fuj = fu j 


For each iteration of 3 we define a new monolithic array, but we can 
easily transform this version to loop over i using an element-at-a-time 
incremental update function: 


(upd x_vec i new_value) 


Or we can define a function bigupdate that has the same semantics as 
the array expression above, but it is only necessary to specify the ele- 
ments that are different from x_vec. 


Haskell 165 


else 
j_loop (j-1) 
(bigupdate x_vec 


{ i = x_vec!i - lu(i,j) * x_vec!j | i <- [(fu j)..j-1] 1 ) 


A bigupdate function could perform an in-place update if the compiler 
determines this is safe (see [1,2]). The semantics of bigupdate takes a 
middle ground between the upd function’s incremental view of func- 
tional arrays and the array constructor’s monolithic view. 


We finally convert our program to use a envelope format version by 
making the appropriate substitutions to convert references to 1u into 
references to pl, pu, and d. Here is the final complete version of the L 
¢U* x= b solver. 


plub (n,pl,d,pu,irl,iru) b = x_vec 


where 
fli =i - 1 4+ irl! (i-j) - iriti 
yi =s * d!i where 
s = b!i - sum [pl!i(irl!i + j) * y_vec!j 
| 3 <- [(£1 i)..i-1]]) 
y_vec = array (1,n) [1 = b!1 * d!1 ] ++ 
[ie=yil i <- [2..n)] ] 
fu j = j- 1 + iruf(j-1) - irulj 
x_vec = j_loop n y vec 


j_loop j x_vec = 
if j < 1 then 
xvec 
else 
j_loop (j-1) 
(bigupdate x_vec 
[ i = x_vec!i - pu! (i + iru!j) * x_vec!j 


{| i <- [(fu j)..j-1] ] ) 


166 P,. Hudak and S. Anderson 


Andy Sherman’s version of plub also deals with reordering the un- 
knowns to achieve a narrower envelope. The linear system A * x = b 
may have been poorly organized for the envelope representation, and 
the equivalent system 


PeAePlePex= Peb 


may require a smaller envelope to store A and its factorization. The 
permutation matrix P reorganizes the rows (P-} the columns) using re- 
verse Cuthill-McKee (RCM) heuristic to minimize the envelope size. 


Sherman's version of plub assumes that the LU-factorized enve- 
lope-format matrix is in RCM order, whereas input vector b and result 
vector x are in the original order. If we have a vector iord represent- 
ing the permutation P mapping the original number i to RCM number 
iord!i, then the change to plub is trivial, and is left as an exercise. 


Acknowledgements 


We wish to thank Joe Fasel at Los Alamos for comments on various 
parts of this document, as well as for providing us with his solution to 
the doctors’ office problem. Also thanks to Los Alamos and Lawrence 
Livermore National Laboratories for their sponsorship of the Salishan 
High-Speed Computing Conference. 


This research was supported primarily by DOE grant FG0O2- 
86ER25012. 


Footnotes 


1. The committee members are Arvind (MIT), Brian Boutel (Victoria 
University of Wellington), Jon Fairbairn (Cambridge University), 
Joseph Fasel (Los Alamos National Laboratory), Kevin Hammond 
(Glasgow University), Paul Hudak (Yale University), John Hughes 
(Glasgow University), Thomas Johnsson (Chalmers Institute of 
Technology), Dick Kieburtz (Oregon Graduate Institute), Rishuyar 
Nikhil (MIT), Simon Peyton-Jones (University College London and 


Haskell 167 


Glasgow University), Mike Reeve (Imperial College), Philip Wadler 
(Glasgow University), David Wise (Indiana University), and Jonathan 
Young (Yale University and MIT). 


2. 


Miranda is a trademark of Research Software Ltd. 


References 


1. 


Anderson, S. and P. Hudak. Efficient Compilation of Haskell Array 
Comprehensions. Technical Report YALEU/ DCS/RR-693, Depart- 
ment of Computer Science, Yale University, New Haven, CT, 
March 1989. (Final version appeared in Proc. ACM PLDI Con- 
ference, June 1990, pgs. 137-149). 


Anderson, S. Compiling Monolithic Functional Arrays into DO 
Loops. Technical Report YALEU/DCS/TR-, Department of Com- 
puter Science, Yale University, New Haven, CT, 1989. 


Eisenstat, S. C. and A. H. Sherman. Subroutines for envelope so- 
lution of sparse linear systems. Research Report 35, Yale 
University, New Haven, CT, October 1974 


Hudak, P. and S. Anderson. Haskell Solutions to the Language 
Session Problems at the 1988 Salishan High-Speed Computing 
Conference. Technical Report YALEU/ DCS/RR-627 Department 
of Computer Science, Yale University, New Haven, CT, January 
1988. 


Hudak, P. ALFL Reference Manual and Programmer’s Guide, 
Second Edition. Research Report YALEU/DCS/RR-322, Depart- 
ment of Computer Science, Yale University, New Haven, CT, 
October 1984. 


Hudak, P. Arrays, non-determinism, side-effects, and parallelism: 
a functional perspective. In Proc. Santa Fe Graph Reduction 
Workshop, Santa Fe, NM, October 1986, pgs. 12-327. 


Hudak, P. and P. Wadler (editors). Report on the Functional 
Language Haskell. Technical Report YALEU/DCS/RR-777, Depart- 
ment of Computer Science, Yale University, New Haven, CT, April 
1990. 


Turner, D. A. The Semantic Elegance of Applicative Languages. In 
Proc. ACM Conference on Functional Programming Languages and 
Computer Architecture, Portsmouth, NH, October 1981, pgs. 85- 
92. 


168 


10. 


P,. Hudak and S. Anderson 


Wadler, P. Listlessness is better than laziness: lazy evaluation and 
garbage collection at compile time. In Proc. 1984 ACM Confer- 
ence on LISP and Functional Programming, August 1984, pgs. 45- 
52. 


Wadler, P. Listlessness is better than laziness ii: composing listless 
functions. In LNCS 217: Programs as Data Objects, Springer- 
Verlag, Berlin, Germany,1985, pgs. 282-305. 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 169 


Id: a language with implicit parallelism 


Rishiyur S. Nikhil 
Arvind 


Massachusetts Institute of Technology 
Laboratory for Computer Science 
545 Technology Square 
Cambridge, MA 02139 


1. Introduction 


Id is a parallel programming language developed in the Computa- 
tion Structures Group at MIT’s Laboratory for Computer Science.! In 
developing Id, we have three major goals. 


High level: at least as expressive as modern functional languages and 
Lisp. Parallelism in Id is implicit—the user does not have to manage 
partitioning, scheduling, and synchronization. 


General purpose: suitable for both “scientific” and “symbolic” compu- 
tation. Id has efficient arrays and floating-point operations, as well as 
recursive data structures (e.g., lists) in an automatically managed heap. 


High performance: Our aim is for an Id program compiled for a 
dataflow machine to achieve at least as much absolute performance as 
its FORTRAN counterpart on a von Neumann machine built with com- 
parable technology.2 


Id is a layered language [6,7], with layer 0 representing the cleanest 
semantics and layer 2 representing the most expressive power. Layer 
0 is purely functional, and is similar to other modern functional lan- 
guages like Miranda and Haskell, i.e., it has higher-order functions, 
non-strict semantics, polymorphic types with static type-checking by 
inference, algebraic types with pattern-matching, list comprehen- 


170 R. S. Nikhil and Arvind 


sions, and user-defined abstract data types. Id’s “array comprehen- 
sions” are fairly unique. 


Layer 1 adds “I-structures” to layer 0. These permit a limited form 
of assignment. One can allocate data structures with empty slots, as- 
sign values to these slots, and read values from these slots. A slot can 
be assigned a value no more than once. Reading a value from a slot is 
automatically blocked until it has been assigned a value. This addition 
sacrifices the referential transparency of layer 0, but retains determi- 
nacy, since the value read from a slot does not depend on the time 
that the program attempts to read it. The loss of referential trans- 
parency implies a certain loss in the ability to transform programs 
(e.g., for proving correctness, for program optimization, etc.); how- 
ever, the benefits are: 


* Certain programs that involve excessive copying when 
written functionally can now be written more effi- 
ciently. 


* Certain programs that must be written recursively when 
written functionally can now be written using loops, i.e., 
tail-recursively. 


Most of the programs in this chapter do not use I-structures explic- 
itly (a small use is made in the Doctor's office section). However, the 
Id compiler uses I-structures extensively to implement all data struc- 
tures, including those from layer 0. 


Layers 0 and 1 are purely determinate, i.e., answers depend only on 
inputs. Layer 2 introduces non-determinism by adding “Managers” to 
layer 1, for those applications that need it, such as shared-resource 
problems and operating systems. An example will be seen in the 
Doctor’s office section. 


Id 171 


It is possible to instruct the Id compiler to only accept programs 
restricted to layer 0 or layer 1, since each layer involves new syntactic 
constructs. 


1.1. Abrief introduction to the language 


Id programs are built up from expressions. In addition to standard 
infix operators like +, Id uses juxtaposition to indicate function appli- 
cation: 


fe, ... Gy 
Functions are defined using def: 
def clip top y = if (y > top) then top else y ; 


Functions are curried, and application associates to the left, so that 
the expression: 


clip 5 
denotes a function of one argument that clips its argument to 5. 


Data structures in Id are defined using algebraic types. However, 
some data structures are so useful that they are pre-defined with spe- 
cial notation. An n-tuple may be constructed by listing n expressions 
separated by commas. Here is a 2-tuple: 


{atb), (a-b) 


A list is either empty (Nil), or constructed using the infix “cons” 
operator (e,:e2). Destructuring of lists (testing for emptiness, access- 
ing the head and tail) is usually done using pattern-matching—function 
on lists are defined in several clauses: 


def length Nil = 0 % for empty lists 


| length (x:xs) = 1 + length xs; % for non-empty lists 


172 R. S. Nikhil and Arvind 


The clauses must have disjoint patterns. The second clause binds x 
and xs to the head and tail, respectively, of the non-empty argument 
list. 


A local scope may be created using a block. Blocks may be nested, 
with standard lexical scoping. The bindings in a block can include 
function definitions: 


% integrate function f(x) from ato b 
def integrate f ab = 
{ delta = 0.0001 ; 
def iter x s = if x > b then 
s 
else 
iter (x + delta) (s + f£ x) ; 
In 


delta * (iter a 0) } ; 


The block contains one ordinary binding and one function binding. 
The value returned by the block (and the integrate function) is the 
value of the expression following the In keyword. Note that inte- 
grate is higher-order—its first argument is itself a function. 


Although recursion subsumes iteration, Id also has while- and for- 
loop constructs. 


Like other functional languages, Id also has list comprehensions. 
The following expression creates all pairs (x, y) such that y <x and 
x2 + y2 < 25 (all pixels in first octant within radius 5): 


{: (x,y) |] x <- 0 to 5 & y <- 0 to x when x*x + y*y <= 25 } 


Terms like x <- 0 to 5 are called generators and terms like 
when... are called filters. Generators and filters are scoped from left 
to right, i.e., they can use identifiers bound in generators to their left. 


Id 173 


Id also has arrays, which are constructed using array comprehen- 
sions. This expression denotes an identity matrix of size nx n: 


{matrix (1,n), (1,n) 

! €i4,j] = 0 |] i <- 2 ton & 3 <- 1 to (i-1) % below diagonal 
|} (4,2) =1 ]1 i <- 1 ton % diagonal 

| {4,47 = 0 11 j <- 2 ton & i <= 1 to (j~-1)} % above diagonal 


In general, arrays can have arbitrary lower and upper bounds, and they 
can be queried at run time using the bounds function. The array con- 
tents in the example are specified in three regions—below, on and 
above the diagonal, respectively. The specifications must be disjoint — 
a run time error will catch multiple definitions. The generator syntax 
is identical to that in list comprehensions. 


For efficiency reasons, we distinguish vectors, matrices, 3-dimen- 
sional arrays, etc. on the basis of type. 


1.2. Non-strict, but not lazy 


Lazy and eager evaluation are not synonymous with strict and non- 
strict semantics, respectively (even though this confusion is 
widespread in the literature). The former terms are concerned with 
operational semantics (what interpreters do), while the latter terms 
are concerned with denotational semantics (declarative meanings of 
programs). Non-strictness may be achieved by both lazy as well as ea- 
ger (parallel) evaluators. 


When a lazy evaluator encounters an application f (arg), no computa- 
tional resources are devoted to arg. Instead, arg is packaged into a 
closure and passed to f. If fever requires the value of arg, the evalua- 
tor then devotes all its computational resources to it, i.e., it invokes 
the closure. 


174 R. S. Nikhil and Arvind 


On the other hand, Id achieves non-strictness with a parallel, eager 
evaluator. Computational resources are shared amongst fand arg, i.e., f 
is invoked in parallel with arg, passing only a place-holder for arg to f. 
If fever requires the value of arg, it blocks on the place-holder. If f 
ignores its argument, it simply discards the place-holder. Eager eval- 
uation is thus speculative. 


However, not all computation in Id is speculative. In particular, the 
arms of conditionals (more-generally, case expressions) are not evalu- 
ated speculatively. After the predicate has determined which arm is 
required, only that arm is evaluated. This is how we control recursion, 
and our experience has been that with this control, in almost all cases, 
the potential waste of resources due to speculative argument evalua- 
tion is not a problem. 


The advantages of Id’s eager evaluation are that 
a) we avoid the overhead of building a closure for arg and 
later invoking it, and 


b) the computation of arg is begun before it is really de- 
manded, thus increasing the parallelism and shortening 
the critical path. 


Consider the following Id expression representing an array of the first 
20 Fibonacci numbers: 


{ fibs = {array (1, 20) 
| [a] = 1 I} i <- 1 to 2 
| [i] = fibs[i-1] + fibs[i-2] || i <- 3 to 20} 
In 
fibs } 


Such a recursive array definition is only possible in non-strict lan- 
guages like Id, Haskell, and Miranda, and is not possible in strict lan- 
guages like Lisp and ML. However, in this example, there is no need 


Id 175 


at all for lazy evaluation. In fact, the overhead of building closures for 
all the array components is likely to far outstrip the cost of computing 
the array in the first place. 


The disadvantage of eager evaluation is that if a value is never de- 
manded, the resources allocated to compute it are wasted. The ex- 
treme case of this occurs when the value represents an infinite struc- 
ture. We handle these cases by special annotations that request lazy 
evaluation. These will be described in our solution to the Hamming 
problem. 


An approach currently being investigated by various researchers is 
to start with lazy evaluation as the default, and to use strictness analy- 
sis to predict where it is safe to evaluate things eagerly. However, it is 
too early to judge its effectiveness on large programs with data struc- 
tures and higher-order functions. Further, it will have no effect on our 
fibs program above, which requires non-strictness. It appears that in 
lazy languages, it will be necessary to have annotations to suggest 
“eager” evaluation. 


We note in passing that the Haskell language definition only re- 
quires non-strict semantics—it takes no position on lazy or eager 
evaluation. This is appropriate, for it leaves implementors with some 
latitude for experimentation. 


1.3. Implementations of Id 


Our efforts have concentrated on compilation of Id for dataflow ma- 
chines which, to date, have been emulated in software. A complete 
programming environment for Id, called “Id World,” is available under 
license from MIT for a small fee. It contains a compiler that translates 
Id programs into the machine code for the MIT Tagged-Token 
Dataflow Architecture (TTDA). In addition, Id World contains GITA, an 
emulator for the TTDA. Extensive instrumentation in GITA permits 
the experimenter to collect and plot various statistics such as paral- 


176 R. S. Nichil and Arvind 


lelism profiles, instruction counts, instruction mixes, resource usage 
profiles, etc. 


We are currently building a real dataflow machine called Monsoon. 
An early single-processor prototype of Monsoon has been running 
compiled Id since October 1988. In collaboration with Motorola, we 
are building new, multi-processor Monsoon machines which are ex- 
pected to be available in the summer of 1991. We aim to retain the 
current Id World interface for Monsoon, so that programs can be de- 
veloped today for Monsoon. 


We have recently begun to study compilation of Id for other se- 
quential and parallel machines as well, which will increase its availabil- 
ity and value to other researchers. 


1.4. Our test runs: parallelism profiles, instruction counts and critical 
path lengths 


All the Id programs in this chapter were run on GITA, and we pre- 
sent their parallelism profiles, instruction counts, and critical path 
lengths. The emulator was run in “idealized” mode, i.e., with the fol- 
lowing assumptions: 


e All instructions take one time unit to execute. 


¢ Each instruction executes as soon as its input data are 
ready ({i.e., immediately after all its predecessor instruc- 
tions in the dataflow graph have executed). 


¢ It takes zero time to communicate data from an instruc- 
tion to its successor in the dataflow graph. 


The parallelism profile is a plot of the number of instructions exe- 
cuted at each time step. The total instruction count, therefore, is the 
area under the curve. The critical path length is that time step after 
which no more instructions execute. 


Id 177 


While this “idealized” mode is admittedly unrealistic, it is very use- 
ful in showing what is the maximum parallelism available under some 
algorithm. As some of the problems in this chapter demonstrate, the 
parallelism of some algorithms is not at all obvious. 


2. Hamming’s problem, extended 


Our program is shown in Figure 1, with hamming_ext as the top- 
level function. 


In order to explain the solution, we begin with a solution for the 
original (simpler) Hamming problem, where the primes are limited to 
2, 3, and 5. It is a direct implementation of the observation that if his 
in the result hs, then 2h, 3h, and 5h are also in hs: 


def hamming n = 
{ hs = 1 : merge 2 (mapmult 2 hs) 
(merge _2 (mapmult 3 hs) 
(mapmult 5 hs)) ; 
In 


until nn hs m} ; 


Here, hs is a stream, i.e., a potentially infinite list. mapmult takes an 
integer p and a stream x), Xo, ... and produces the stream px), px2,.... 
merge _2 takes two streams in ascending order and merges them into a 
new stream in ascending order, removing duplicates. until_n pro- 
duces the prefix of a stream containing just those numbers that are 
<n. These functions are shown in Figure 1. 


In each of these functions, we are dealing with potentially infinite 
lists. Thus, we use the “lazy-tail” list-constructor :# to override Id’s 
default eager evaluation. It is also possible to delay the head, or both 
the head and the tail, by using the constructors #: and #:#, respec- 
tively. The annotations are only in the constructor—component se- 
lection is identical for delayed and non-delayed components. 


178 R. S. Nikhil and Arvind 


def hamming_ext primes n = 
{ def f xs p = { h = merge _2(mapmult p (1:h)) xs 
In 
hj; 
hs = 1: foldl_list f Nil primes ; 
In 


until nn hs }; 


def mapmult p Nil = Nil 

| mapmult p (x:xs) = (p * x):#(mapmult p xs); 
def merge _2 Nil Nil = Nil 

| merge _2 (x:xs) Nil = (x:xs) 

| merge_2 Nil (y:ys) = if (x < y) then 


x :# merge 2 xs (y:ys) 
else if (x > y) then 
y :# merge_2 (x:xs) ys 
else 
x :# merge 2 xs ys ; 
def until_nn Nil = Nil 
| until n (x:xs) = if (x <= n) then 
x: (until_n n xs) 
else 


Nil ; 


Figure 1 - Id program for the extended Hamming problem 


The above program is inefficient because the three streams that are 
merged together contain several duplicates (e.g., 2 * 3, 3 * 2) that are 


then removed during the merge. Here is a new solution that avoids 
building duplicates in the first place: 


def hamming n = 


{ hs = 1: { sl = mapmult 2 (1:sl) ; 
s2 = merge_2 (mapmult 3 (1:s2)) sl ; 
s3 = merge_2 (mapmult 5 (1:s3)) s2 ; 


Id 179 


In 
s3); 
In 


until_nnhs } ; 


Here, s1 contains all the powers of 2, s2 merges in all products with 
all powers of 3, and so on. 


Finally, we can generalize our last solution so that, instead of 
working with just the primes 2, 3, and 5, it works with a list of 
primes. This is shown in the function hamming _ext in Figure 1. The 
function foldl_list performs the nested merge of our previous solu- 
tion, 


(£ (...(£ (£ Nil pl) p2)...) pN) 
It is available as a library function in Id, but it can also be defined as: 


def foldl_list f z Nil =z 
| foldl list f 2 (x:xs) = foldl_ list f (f z x) xs ; 


2.1. Atest run 


We ran the following program on GITA, our dataflow emulator: 
hamming_ext (3:5:7:11:13:17:19:23:Nil) 5000 
with the following output: 


(1.3.5 7.9 11 13:15 17 19 21 23 25 27 33 35 39 45 49 51 55 57 63 
65 69 75 77 81 85 91 95 99 105 115 117 119 121 125 133 135 143 147 
153 161 165 169 171 175 187 189 195 207 209 221 225 231 243 245 
247 253 255 273 275 285 289 297 299 315 323 325 343 345 351 357 
361 363 375 385 391 399 405 425 429 437 441 455 459 475 483 495 
507 513 525 529 539 561 567 575 S85 595 605 621 625 627 637 663 
665 675 693 715 729 735 741 759 765 805 819 825 833 845 847 855 
867 875 891 897 931 935 945 969 975 1001 1029 1035 1045 1053 1071 


180 


ops 


R. S. Nikhil and Arvind 


Crit. path length: 9,384 


Total operations : 


1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 


116,739 


steps 


Figure 2 - Parallelism profile for (hamming_ext (3: 5: ... :23: nil) 5000) 


1083 
1275 
1485 
1715 
1925 
2197 
2475 
2737 
3087 
3325 
3757 
4125 
4459 
4851 


which is not in Id syntax because GITA, 


1089 
1287 
1495 
1725 
1955 
2205 
2499 
2783 
3105 
3375 
3773 
4131 
4485 
4875 


1105 
1309 
1521 
1729 
1989 
2223 
2527 
2793 
3125 
3381 
3795 
4165 
4563 
4913) 


1125 
1311 
1539 
1755 
1995 
2261 
2535 
2805 
3135 
3465 
3825 
4199 
4617 


1127 
1323 
1547 
1771 
2023 
2275 
2541 
2835 
3159 
3519 
3861 
4225 
4641 


1155 
1331 
1573 
1785 
2025 
2277 
2565 
2873 
3179 
3549 
3887 
4235 
4655 


1173 
1365 
1575 
1805 
2057 
2295 
2601 
2875 
3185 
3553 
3927 
4275 
4675 


1183 
1375 
1587 
1815 
2079 
2299 
2625 
2907 
3211 
3575 
3933 
4301 
4693 


1197 
1377 
1615 
1859 
2093 
2375 
2645 
2925 
3213 
3591 
3969 
4335 
4719 


1215 
1425 
1617 
1863 
2125 
2401 
2673 
2975 
3249 
3645 
3971 
4347 
4725 


1225 
1445 
1625 
1875 
2145 
2415 
2691 
3003 
3267 
3675 
3993 
4375 
4761 


1235 
1449 
1683 
1881 
2185 
2431 
2695 
3025 
3289 
3703 
4025 
4389 
4807 


1265 
1463 
1702 
1911 
2187 
2487 
2717 
3059 
3315 
3705 
4095 
4455 
4845 


written in Lisp, simply prints 


out the Lisp value of the result. The parallelism profile generated is 
shown in Figure 2. 


3. The paraffins problem 


Turner's original solution [9] was written in the language KRC. It 
can be transcribed practically verbatim into Id, since the functional 
core of Id is similar to KRC (including list comprehensions), and Id 
shares the same non-strict semantics as KRC. However, that solution 


Id 181 


type radical = H | C radical radical radical ; 
def 3 partitions m = {: (i,j,k) || i <- 0 to floor (m/3) 
& j <- i to floor ((m-i)/2) 


&k = m- (i + 3) VF 
def remainders Nil = Nil 
| xvremainders (r:rs) = (rirs) : (remainders rs); 
def radical_generator n = 
{ vadicals = {array (0,n) 
| (0) = H:Nil 
| [j] = rads_of_size_n radicals j ||] j <- 1 to n} 
In 
radicals}; 
def rads _of_size_n radicals n = 
{: C ri rj rk [| (1,3,k) <- 3_partitions (n-1) 


& xi:ris <- remainders (radicals[i]) 

& xrj:rjs <- remainders (if (i == j) then ri:ris 
else radicals[j]) 

& xk <- if (j == k) then rj:rjs 


else radicals[k] }; 


Figure 3 - Id program for generating radicals 
for the paraffins problem 


is quite inefficient, because it generates many duplicates only to be fil- 
tered out later. In [2], we showed an efficient program that avoids 
generating duplicates in the first place, using the canonical tree-enu- 
meration techniques described in [4] (and discovered independently 
by S. K. Heller, our co-author in [2]), That solution is repeated here, 
and is shown in two parts: Figure 3 shows the code for the sub-prob- 
lem of generating radicals, and Figure 4 shows the generation of pa- 
raffins, with top-level function paraffins_until. For more details, in- 
cluding a discussion of the development of the solution, please see [2]. 


182 R. S. Nikhil and Arvind 


3.1. Radicals 


Figure 3 - Id program for generating radicalsA radical is a paraffin 
with a single hydrogen atom removed, i.e., a molecule with formula 
CiHou1. The structure of such molecules can be recursively described 
as either 

¢ a hydrogen atom, or 

¢ a-carbon atom attached to three other radicals. 
This is expressed in the radical type-declaration in Figure 3. It con- 
sists of two disjuncts with constructors H and C, respectively. In the 


latter case, there are four components, each of which is itself of type 
radical. 


By way of illustration, here are some more examples of algebraic 
type declarations. The type of booleans can be declared: 


type bool = False | True ; 

The type of binary trees with integers in the nodes can be declared: 
type tree = Leaf | Node int tree tree ; 

And, the type of lists can be declared: 
type (list *0) = Nil | Cons *0 (list *0) ; 


The list type is polymorphic because it is parameterized by a type vari- 
able *0, so that we can have lists of integers, lists of booleans, lists of 
lists of integer-to-integer functions, etc. 


3.2. Generating radicals 


Suppose we wish to generate all radicals of size n. For n>O, the 
radical will have one carbon as its “root” carbon, and three sub-radi- 
cals of collective size n- 1. Thus, we need to partition n- 1 into three 


Id 183 


sizes in order to generate the sub-radicals. However, if we use all 
possible 3-partitions of n- 1, we will generate many duplicate radicals 
because the partition (i, j, k) is equivalent to (i, k, jf), (, tk), and so 
on. We can avoid this by insisting that i<j<k. The function 
3 partitions is a function to generate a list of all three partitions of m 
(= n-1) in this canonical order. 


The generation of radicals of size ncan be defined recursively. 
When n=O, there is only one such radical—a lone hydrogen atom. 
When n> 0, we construct all canonical 3-partitions (i, j, kK) of n- 1; for 
each such partition, we generate, recursively, all radicals ri of size i, all 
radicals rj of size j and all radicals rk of size k, and construct the new 
radical (C ri rj rk). Here is the function: 


def rads of size nn = 
if (n == 0) then 
H:Nil 
else 
{: C ri rj rk [[ (i,4,k) <- 3_partitions (n-1) 
& ri <- rads_of_size ni 
& rj <- rads of _size_n j when (le? ri rj) 


& rk <- rads of size _n k when (le? rj rk) }; 


where le? is some function that checks that its two radical arguments 
are in canonical order. The first when clause takes care of the follow- 
ing situation: when i=, since ri and rj range over all possible pairs 
of radicals of size i, they may not be in canonical order—the when 
clause filters out these pairs. Similarly, the second when clause filters 
out duplicates when j = k. 


We can avoid this generation of duplicates and filtering as follows. 
For each i, let ri range over rij, rig, .... Then, when i=j and, when ri 
is, say, ri4, we make r2 range over riq, rig ...... Thus, for each element 
of the list, we would like to have access not only to that element, but 
also to the remainder of that list. 


184 R. S. Nikhil and Arvind 


In particular, we need a function that, given rij, rig, rig .... , pro- 
duces the list of lists: 


(rij, rig, rig, rig, rig ...), (rig, rig, rly, ris ...), (Tig, rig, Tig...) ... 
The function that performs this is called remainders (Figure 3). 


Now, we can improve our rads_ of _size_n function: 


def rads_of size nn = 
{: C ri rj rk [| (i,3,k) <- 3_partitions (n-1) 
& vi:ris <- remainders (rads_of_size_n i) 
& rj:rjs <~ remainders (if (i == 4) then ri:ris 
else rads_of_size_n }j) 
& rk <- if (j == k) then rj:rjs 


else rads_of_size_n k }; 


However, there is still a major inefficiency in this function. It has the 
classical Fibonacci recursion, because to compute radicals of, say, size 
3, we compute radicals of size 0, 1, and 2, but to compute radicals of 
size 2, we compute radicals of size 0 and 1, and so on. In other words, 
we recompute radicals of each size too often. 


We use a standard trick—use an array to cache, at index n, the list 
of radicals of size n, and just look up this array each time we need 
radicals of size n. In Figure 3, radical_generator is the function that 
constructs this array, using rads_of_size_n to compute each compo- 
nent. And, rads_of_size_n itself uses this array to find radicals of size 
<n. 


3.3. Paraffins from radicals 
Or, Molotov cocktails? 


In order to define a canonical form for paraffins, we observe that 
every paraffin of size n has either 


Id 185 


¢ a unique bond center, i.e., a bond with radicals of size z 


on both sides, or 


* a unique carbon center, i.e., a carbon attached to 4 
: ; n 
radicals, each of size <7. 


This is expressed in the paraffin type-declaration in Figure 4. We 
can then define our canonical representation as either (BCP ri r2) 
where rl <r2, or (CCP ri r2 r3 r4) where rl s<r2<1r3< r4. 


Bond-centered paraffins can be enumerated using the function 
BCP_generator, whose first argument should be the array of radicals 
defined in the previous section, i.e., from the list of radicals of size e ; 


we draw all pairs rl and r2 such that r2 does not precede rl in the 
list. 


To enumerate carbon-centered paraffins in canonical order, we 
follow a strategy similar to the one used to produce radicals. For a 
paraffin of size n, one carbon is the center, so we find all canonical 4- 
partitions (i, j, k, |) of n - 1, representing the sizes of the four at- 
tached radicals (i.e., i<j <k<1l). For each of these 4-partitions, we 
take all the radicals {ri}, {rj}, {rk} and {rl} of sizes i, j, k, and l, re- 
spectively. We use the same “remainders” trick to take care of avoid- 
ing duplicates when i=j,j/=k, and k=l Using these radicals, we 
form the paraffins (CCP rl r2 r3 r4). 


Finally, our top-level function paraffins until takes a numeric ar- 
gument n and generates an array of size n such that the n-th index 
contains a pair of lists—a list of all bond-centered paraffins of size n 
and a list of all carbon-centered paraffins of size n. Each list is in 
canonical order. It would be easy to flatten this into a single list, if 
desired. 


186 R. S. Nikhil and Arvind 


type paraffin = BCP radical radicals | 
CCP radical radical radical radical ; 
def BCP_generator radicals n = 
if (odd? n) then 
Nil 
else 
{: BCP rl r2 || rl:rls <- remainders (radicals[{floor (n/2)]) 
& x2 <- rl:rls }; 
def 4 partitions m = 
{: (i,j-k,1) [| i <- 0 to floor (m/4) 
& j <- i to floor ((m-i)/3) 
& k <- (max j (ceiling (m/2-i-j))) to 
(floor ((m-i-~j)/2)) 
& l= m- (itj+k) 3; 
def CCP_generator radicals n = 
{: CCP ri rj rk rl || (i,3j,k,1) <- 4 partitions (n-1) 
& ri:ris <- remainders (radicals[i]) 
& rj:rjs <- remainders (if i==j then ri:ris 
(else radicals[j]) 
& rk:rks <- remainders (if j==k then rj:rjs 
(else radicals[k]) 
& rl <- if (k==1) then rk:rks 
else radicals[1l]} }; 
def paraffins_until n = 
{ radicals = radical_generator (floor (n/2)) ; 
In 
{array (1,n) 
| {3] = (BCP_generator radicals }), 


(CCP_generator radicals }) 11 3 <- Ll ton }j; 


Figure 4 - Id program for the paraffins problem 


Id 187 


Crit. path length: 2,966 
Total operations : 444,143 


500 1,000 1,500 2,000 2,500 3,000 steps 


Figure 5 - Parallelism profile for (paraffins_until 15) 


3.4. Atest run 


The parallelism profile for (paraffins until 15) is shown in Fi- 
gure 5. The number of paraffins containing n carbons, for n = 1, 2, 
..,15 are 1, 1, 1, 2, 3, 5, 9, 18, 35, 75, 159, 355, 802, 1858 and 
4347, respectively. The parallelism of this program is not at all obvi- 
ous from the algorithm. 


4, Adoctor's office 


It is not clear which of the following programs is requested in the 
problem statement. 


¢ A determinate simulation in which the non-determin- 
ism of the real world is modelled by an oracle that is a 
parameter to the program; or 


¢ A program that is itself non-deterministic. 


If the intent is really to simulate a doctor's office, then the former 
program makes more sense, because it is repeatable and we can con- 
trol the choices made by the oracle. However, if the intent is to see 
how non-determinism is handled by the programming language (for 
example to evaluate its suitability for operating systems code), then 


188 R. S. Nikhil and Arvind 


the latter program makes more sense. Accordingly, we have devel- 
oped solutions for both interpretations. 


We begin by discussing how (pseudo-)random numbers may be gen- 
erated and used in Id. 


4.1. Random numbers 


Here is a function for the linear congruence method of generating 
random numbers?. Given a seed x, it returns a random number r in 
the range 0 to 1 and a new Seed x'. 


def rand_fn X = 
{ a = 25173 ; c = 13849 ; m = 65536 ; 
r=xX/ (m- 1) ; 
X' = mod (a*X + c) m ; 
In 


re, x? ys 


Streams of random numbers: We can use this function, for example, to 
produce a stream of random numbers, given an initial seed x0: 


def mk_random_stream X0 = 
{ def mk_rs X = { r,X' = rand fn X 
In 
x:# mk_rs X' }; 
In 


mk_xs X0 }; 


Here, we have used the “lazy-tail” list constructor :# to delay the con- 
struction of the tail of the stream, since we are likely to use only a 
small prefix. For more efficiency, it is possible to mix regular and lazy 
list constructors so that, for example, the stream is delayed at every 
100-th element. 


Id 189 


Picking a random element of a list: Given a random number rin the 
range O to 1, we can use the following function to pick a random 
member of a list. Pick_random not only returns the random element, 
but also the other members of the list: 


def pick_random r xs = 


{ n = round (r * (length xs - 1)) ; 


def separate j (x:xs) = if (j == n) then x,xs 
else { x',xs' = separate (j+1) xs 
In 
x', (xixs') } 
In 


separate 0 xs }; 


Picking a random event: Suppose we want to choose one of three 
possible events with probabilities 1%, 54%, and 45%, respectively. 
Here is a function we can use: 


def choose_event r = if (r <= 0.01) then 0 
else if (r <= 0.55) then 1 


else 2 ; 


where, again, r is a random number in the range 0 to 1. 


4.2. Adeterminate simulator for the doctor's office 


Our determinate solution is shown in Figure 6. The state of the sys- 
tem at any time can be modelled by seven items: 


1. wellps, a collection of well patients. 

2. waitps, a queue of sick patients waiting for doctors. 
3. waitds, a queue of doctors waiting for sick patients. 
4, consults, a collection of patient-doctor pairs (in 


consultation). 


190 R. S. Nikhil and Arvind 


def sick_event r (wellps,consults,waitps,waitds,histi,hist2,hist3) = 
if (wellps == Nil) then 
(wellps, consults, waitps, waitds, hist1,hist2,hist3) 
else 
{ (p,wellps') = pick_random r wellps 
In 
if (waitds == Nil) then 
(wellps',consults, (waitps ++ p:Nil),waitds,p:hist1,hist2,hist3) 
else 
{ (d:waitds') = waitds 
In 
(wellps', (p,d) :consults, 
waitps,waitds',p:histl, (p,d) :hist2, hist3) }}; 
def cure_event r (wellps, consults,waitps,waitds,histl,hist2,hist3) = 
if (consults == Nil) then 
(wellps, consults, waitps, waitds, hist1,hist2,hist3) 
else 
{ (p,d),consults' = pick_random r consults 
In 
if (waitps == Nil) then 
(p:wellps,consults',waitps,waitds ++ d:Nil,histl,hist2,p:hist3) 
else 
{ (p':waitps') = waitps 
In 
(p:wellps, (p',d) :consults, 
waitps',waitds,histl, (p',d) :hist2,p:hist3) }}; 
def process randoms (wellps,consults,waitps,waitds,histl,hist2,hist3) = 
{ vrl:r2:randoms' = randoms ; 
e = choose event ri 
In 
{case e of 
0 = reverse histl, reverse hist2, reverse hist3 
| 1 = process randoms' (sick_event r2 (wellps, consults, waitps, 
waitds,histl,hist2,hist3) ) 
process randoms' (cure_event r2 (wellps,consults,waitps, 
waitds,histl,hist2,hist3))}}; 


i) 
i] 


def simulate randoms patients doctors = 
process randoms (patients,Nil,Nil,doctors,Nil,Nil,Nil) ; 


Figure 6 - Determinate program for the doctor’s office problem 


Id 191 


5-7. Three “histories:” hist1, a list of patients as they fall 
sick; hist2, a list of patient-doctor pairs as they go 
into consultation; and hist3, a list of patients as they 
get cured. 


There are only two kinds of events that drive the system: 


1. Asick_event: Some well patient p falls sick. Of course, 
this can only happen if wellps is non-empty. 


Effects: We use r, a random number in the range 0 to 1 
to choose which patient falls sick. If a doctor d is 
available (waitds is non-empty), p and d go into con- 
sultation (consults); otherwise, p joins the queue 
waitps. 


We record that p fell sick in hist1. If a consultation (p, 
da) began, we record it in hist2. 


2. Acure_event: Some consultation terminates (doctor d 
cures patient p). Of course, this can only happen if 
consults is non-empty. 


Effects: We use r, a random number in the range 0 to 1 
to choose which patient-doctor consultation terminates. 
The patient p rejoins the well patients (wellps). If 
there is a waiting patient p' in waitps the doctor d goes 
into consultation with p'; otherwise, the doctor rejoins 
the queue waitds. 


We record that p was cured in hist3. If a consultation 
(p', d) began, we record it in hist2. 


These state transitions are encoded in the functions sick_event 
and cure_event (all the state components are modelled as lists). Each 
function takes a random number and a state and produces a new state. 
The sick-patients and free-doctors queues are represented as lists, 
where the first element of the list represents the head of the queue. 


192 R. S. Nikhil and Arvind 


Thus, enqueuing is performed using ++, the built-in infix list-append- 
ing operator. It should be observed that the histories are constructed 
in reverse order. We shall reverse them at the end of the simulation. 


simulate is the top-level driver. It takes a stream of random num- 
bers (in the range 0 to 1), the initial list of patients and the initial list 
of doctors, and calls process, passing in empty lists for consults, 
waitps, histi, hist2, and hist3. 


The function process chooses an action randomly: stop, a sick- 
event, or a cure-event. In the first case, it reverses the histories and 
returns them as the final result. In the latter two cases, it applies the 
appropriate state-transition function, and recursively calls process on 
the new state. 


4.2.1. A test run 


We ran the following test program: 


def test (patients,doctors,seed) = 
{ randoms = mk_random_stream seed ; 
(ps,pds,cs) = simulate randoms patients doctors ; 
In 


{length ps), (length pds), (length cs), ps, pds, cs }; 


supplying a list of 20 patients (PO through P19), a list of 4 doctors (DO 
through D3) and an initial random seed. The first three components 
of test’s resulting 6-tuple showed that during the simulation, patients 
fell sick 92 times, doctors were paired with patients 76 times, and 
doctors cured patients 72 times. 


It is clear from the parallelism profile shown in Figure 7 that the 
program does not have much parallelism. 


Id 193 


Crit. path length: 11,166 
oo Total operations : 184,612 


1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,0001 1,00012,000 steps 


Figure 7 - Parallelism profile for deterministic Doctor program 
(20 patients, 4 doctors) 


4.3. Non-deterministic programming in Id 


As mentioned in the introduction, Id is a layered language. The first 
two layers (functional, and I-structures) retain determinacy—results 
depend only on inputs. However, in operating systems code and other 
applications, we may require non-deterministic behavior. The doctor's 
office problem may be viewed as an abstraction of the resource man- 
ager problem in operating systems. The doctors may be regarded as 
resources that are demanded and held non-deterministically by client 
processes (patients). 


To express such computations, Id has another layer called 
“managers.” Managers are a relatively recent addition to Id. Although 
the technical ideas behind managers are quite well understood, the 
notation is still experimental and is not yet part of the Id manual. We 
have a prototype implementation, and a test run of the program de- 
scribed in the next section was executed on this implementation. 


There have been various attempts to introduce non-determinism 
into functional languages. Perhaps the most common approach is to 
use a special non-deterministic merge operation: given two stream in- 
puts, merge produces an interleaved output, where the interleaving is 
non-deterministic. In an actual implementation, the interleaving is 


194 R. S. Nikhil and Arvind 


typically done in the temporal order in which elements of the input 
streams become available. In order to distribute results from a re- 
source manager to requestors, it is necessary to: tag the requests in 
the input-streams with unique stream-identifiers before merging them 
non-deterministically; carry these tags along with the resource-alloca- 
tion computation so that they identify which result is meant for which 
requestor, and split and distribute the output stream according to 
these tags. 


This approach to non-determinism is highly unsatisfactory for sev- 
eral reasons. First, it is very difficult to use when the number of 
streams to be merged (number of users of a resource) is not manifest, 
leading to a “spaghetti” of tagging and plumbing. Second, it is very 
difficult to follow a static type discipline, because all the different 
kinds of requests to a resource manager, each with different argu- 
ments, must be merged into a single homogeneously typed stream. 


A more elegant attempt is described in [1], from where we have in- 
herited the term “managers.” A manager was specified as a stream- 
to-stream function, and the tagging and non-deterministic merge at 
the entry to the manager was hidden with clever notation and clever 
implementation. While eliminating the “plumbing” problem of ex- 
plicit non-deterministic merges, the static-type-checking issue still 
remained. 


The Id manager construct not only solves all these problems, but 
also lends itself to very efficient implementation. 


Non-determinism and side-effects are closely related—each can be 
used to simulate the other. However, managers are an attempt to fa- 
cilitate disciplined use of side-effects in the presence of parallelism. 
The manager construct declares a new, abstract type, ie., objects of 
this new type may only be manipulated by the set of interface proce- 
dures (called handlers) specified in the manager construct. However, 
unlike ordinary abstract types, these objects have state that may be 


Id 195 


updated by the handlers. Thus, each handler not only specifies a re- 
sult to be returned, but also possibly a new state for any objects given 
to it as arguments. Manager semantics guarantee that the state transi- 
tions on an object appear to be atomic. 


We begin with a simple random-number generator manager that 
uses rand_fn, the linear congruence method from Section 4.1: 


manager rand _ supplier = Cons_cell float 
{ def make_rand_supplier seed = Cons_cell seed ; 
def next_rand (Cons_cell seed) = { r, seed' = rand_fn seed ; 
new seed = seed' 
In 
xy? 


}e 


The first line is similar to an algebraic type definition. It defines a 
new type, rand_supplier, and specifies the constructors for this 
type—here, just one unary constructor Cons_cell. By using the 
keyword manager instead of type, we indicate (a) that it is 
updateable and (b) that the constructor Cons_ce1l1i is visible only in the 
statements that follow between braces, thus making rand_supplier an 
abstract type with respect to the rest of the program. The statements 
in braces consist of two handlers. The first is a constructor of new 
rand_supplier objects. Given a seed, it creates a new object contain- 
ing the seed, and returns the new object as its result. This object is 
first-class; i.e., in the rest of the program, it can be an argument or re- 
sult of a procedure, it can be stored in data structures, etc. However, 
because Cons _celi is not visible outside the manager declaration, the 
object is opaque to the rest of the program. To do anything that re- 
quires manipulation of the internal state (reading it or updating it) it 
has to be passed to one of the handlers, which are the only procedures 
that can examine and update the internal state. 


196 R. S. Nikhil and Arvind 


The second handler, when applied to a rand_supplier object, ap- 
plies rand_fn to the old seed value to produce a random number r and 
a new seed, seed’. The update to the state is specified by the binding 
that uses the keyword new. The procedure returns the value r. 


Because a rand_supplier object is a first-class object, there may be 
many references to it. Thus, there may be many concurrent attempts 
to apply next_rand to it. Managers guarantee that such concurrent ac- 
cesses are serializable, even though the user has not mentioned any 
locking or synchronization. The reading-out of the seed, application of 
rand_fn to it, and storing of the new seed is performed as an atomic 
action. Thus, two concurrent accessors can never see the same seed. 


In general, a manager object can contain multiple components, and 
each handler may read and update more than one component. 
Atomicity is still guaranteed, i.e., another concurrent execution of a 
handler on the same object cannot read intermediate states. These 
properties of a managed object make it easy to establish and maintain 
invariants on the state of the object by ensuring that each state update 
by a handler maintains the invariant. 


Thus, managers are akin to Hoare’s monitors [3]. However, there is 
an important difference that has far-reaching consequences. Within a 
handler, the new value of each state component can be specified at 
most once. This allows handlers to be non-strict, i.e., the state update 
(critical section) and the return-value computation can be decoupled. 


One consequence of this decoupling is increased concurrency. The 
return value computation does not have to be in the critical section, so 
that the critical section may be released before the “return value” is 
computed. Conversely, the critical section does not have to be in the 
critical path of the caller of a handler, so that a result may be returned 
to the caller before the critical section is completed. 


Id 197 


A second major consequence is that a manager can have complete 
control over the scheduling of concurrent accessors (events, waits and 
signals, in monitor terminology). To achieve this, we use a feature in 
Id called “I-structures.” I-structures are dynamically allocated and, at 
first, appear to be ordinary updateable cells. For example, the function 
make cell allocates a cell, the procedure put_celi stores a value in 
the cell, and the procedure get _cell reads a value from the cell. 
However, I-structure semantics make them different from updateable 
cells. In particular, a value may be put into a cell at most once, and 
get_cell automatically blocks on an empty cell until a value has been 
written there. An error occurs if an attempt is made to put a value 
twice into a cell. 


We illustrate this device with a manager for a queue of strings. A 
small modification of this example will be used later in our Doctor’s 
office solution. We wish to use handlers eng and deq to enqueue and 
dequeue strings from the queue, respectively. A fundamental differ- 
ence from, say, the rand_supplier manager is that the queue manager 
may have to respond to requests out of order. In particular, if one 
process attempts deq on an empty queue, its response must be de- 
ferred until another process performs an enq operation. We call such 
managers scheduling managers, as opposed to simple managers like 
rand_ supplier. 


The code for the queue manager is shown below. 


manager queue = 

Cons_queue (list 5S) % queued strings 
(list (cell S)) % waiting dequeuers 
{ def make_queue xs = Cons_queve xs Nil ; 
def enq (Cons_queve xs cs) x = % x is a string 
{ case cs of 

% Enter x at end of queue 
Nil = { new xs = xs ++ (x:Nil) 


In 


198 R. S. Nikhil and Arvind 


Ok } 
| (c:es') = { new cs = cs! ; 
% Send x to the dequeueing process that is 
% blocked onc 
eall put_cell c x 
In 
Ok }}; 
def deq (Cons_queuve xs cs) = 
{ case xs of 
% Make a cell to block on 


Nil = { c = make_cell _ ; 


% Enter it at end of dequeuers 

new cs = cs ++ (c:Nil) 
% Block, trying to read the cell 
In 

get_cell c } 

| (x:xs') = { new xs = xs! ; 
In 
x bbe he 


In the first line, we indicate that the internal representation of a 
queue object is built with the Cons_queue constructor, and contains 
two components—a list of strings (S is the type for strings) and a list of 
cells that can contain strings (cells are typed objects). The first list 
represents the strings that are currently enqueued. The second list 
represents the cells on which dequeuing processes are waiting. Note 
that both lists will never simultaneously be non-empty, i.e., either 
there can be strings enqueued or there can be waiting processes, but 
not both. 


The constructor make_queue takes an initial list of strings xs, builds 
a queue object containing these strings and an empty list of waiting 
cells, and returns this object. 


Id 199 


The enq handler checks if there are any waiting processes (cells 
cs). If not, it attaches the given string x at the end of the list of en- 
queued strings (using the built-in list-concatenation operator ++). If 
there is a waiting process (blocked on cell c), it puts the string x into 
the cell (thereby unblocking the waiting process and giving it the 
string), and updates the list of cells to be the rest of the cells cs'. In 
either case, the constant Ok is returned as a result to the enqueuing 
process. The keyword cali is used to indicate that the expression 
following it is executed purely for its side-effect (in this case, the call 
to put_ceil). 


The deq handler checks if there are any enqueued strings xs. If 
not, it allocates an empty cell c, appends it to the list of waiting cells, 
and blocks trying to read the cell c. Because of the non-strict evalua- 
tion mechanism, the blocking takes place outside the critical section, 
i.e., the state update can take place and the object is then available for 
other concurrent processes, one of which will presumably unblock 
this process by writing a value into c. Note that we use “_” as a “don’t 
care” argument to make_cell. 


If there is an enqueued string x available to a deq request, it is re- 
turned as a result, and it updates the state to contain the remaining 
enqueued strings xs'. 


Thus, the semantics of managers are such that it allows us to 
choose the order in which it responds to requests. This is in contrast 
to the “implicit” scheduling imposed by the wait and signal constructs 
in Hoare’s monitors. 


4.4. Anon-deterministic simulator for the doctor's office 


From the queue example, it is but a small step to the non-deter- 
ministic doctor’s office simulator. The code is shown in Figure 8. The 
top-level function is simulate which takes a random-number supply 
(an object as discussed above), a list of patients, a list of doctors, and a 


200 R. S. Nikhil and Arvind 


number max_iter which determines the duration of the simulation. It 
creates a doctor's office initialized with all doctors free (to be de- 
scribed below). The for p-loop, being a parallel loop, simulates all 
patients simultaneously. 


The for j-loop simulates the life of each patient. Until the end of the 
simulation5, patient p repeatedly behaves like this. He is healthy for a 
random duration (using the procedure delay) and then falls sick, at 
which point he asks the office for a doctor. When he gets a doctor, he 
consults with him for a random duration (second delay) and is then 
cured. He then returns the doctor to the office and repeats the cycle. 


The THEN separator is used to force a sequencing where otherwise 
things would have been done in parallel. When all the loops have 
terminated (ensured by the last THEN), we extract and return the his- 
tories maintained in the office. 


The manager definition for the doctors’ office is shown next in 
Figure 8. Like the queue manager, it also maintains two queues: a 
queue of free doctors and a queue of waiting sick patients; in addition, 
it maintains the three history lists (patients falling sick, patients-and- 
doctors going into consultation and patients cured). We model a doc- 
tor by a string (the doctor’s name). A waiting patient is modeled by a 
string (the patient’s name) and a cell that represents the place where 
a doctor should be put when one becomes available. 


The constructor make_office simply creates a new office object 
with the free-doctors component initialized to the given list of doc- 
tors, and all other components empty. 


Id 201 


def simulate rand_supply patients doctors max_iter = 
{ office = make_office doctors ; 
{for p <- patients do 
{for 3 <- 1 to max_iter do 


delay (next_rand rand_supply); % Be healthy for random time 
THEN d = get_doctor office p; 


% For each patient 


THEN delay (next_rand rand_supply); % Consult for random time 
THEN call put_doctor office p d 


THEN }}; 
THEN 
In 
histories office }; 
manager office = Make office 

(list S) % waiting doctors 
{list (S, cell S)) % waiting patients 
(list S) (list (S,S)) (list S) % histories 


{ def make_office doctors = Make_office doctors Nil Nil Nil Nil Nil; 
def get_doctor (ds,sps,hli,h2,h3) p = 
{ case ds of 


(d:ds') = { new ds = ds' ; % Doctor d available 
new h2 = (p,d):h2 ; new hl = p:hl 
In 
da } 


| Nil = { c = make_cell_ ; 
new sps = sps ++ (p,c):Nil ; 
new hl = p:hl 
In 
get_cell c }}; 
def put_doctor (ds,sps,hi,h2,h3) pd = 
{ case sps of 


% No doctor available 


((p',c):sps') = { new sps = sps'; % Patient p' waiting 
call put_cell c d; % Send doctor to him 
new h2 = (p',d):h2 ; new h3 = p:h3 

In 
Ok } 
| Nil = { new ds = ds ++ d:Nil ; % No patient waiting 
new h3 = p:h3 
In 
Ok }}; 


def histories (ds,sps,hi,h2,h3) = 


reverse hl), (reverse h2), (reverse h3) ;}; 


Figure 8 - Non-deterministic program for the doctor’s office problem 


202 R. S. Nikhil and Arvind 


The handler get_doctor checks if there is any free doctor d. If so, 
he is removed from the free list ds. Then, p and d are entered in his- 
tory h2 and d is returned to the requestor. If there is no free doctor, 
then the response is deferred using a cell c which is appended with p 
into the list of waiting sick patients. In either case, p is entered in 
history h1. 


The handler put_doctor checks if there is any waiting sick patient 
p'. Ifso, p' is taken off the waiting list, the released doctor is imme- 
diately sent to p' via the associated cell c, and (p',d) is entered in h2. 
If there are no waiting patients, the doctor is returned to the free doc- 
tors list. In either case, the cured patient p is entered in history h3, 
and an acknowledgment (0k) is returned to him. 


Finally, the handler histories returns the three histories, first re- 
versing them bcause they were collected in the reverse order. 


4.4.1. A test run 


The code we ran was a small variation on the code shown in Figure 8 
because our experimental implementation is based on an older nota- 
tion. The parallelism profile is shown in Figure 9. 


5. Skyline matrix solver 
5.1. Crout’s method for LU decomposition 


To solve the system of linear equations Ax = b, our approach is 
based on LU decomposition using Crout’s method, as described in [8]. 
We first show a solution for dense matrices, and then modify it for 
skyline matrices. Suppose we express A as the product of some L and 
U, which are lower and upper triangular matrices, respectively: 


LeuUu=A 


Id 203 


Crit. path length: 3,985 
Total operations : 52,140 


1,000 2,000 3,000 4,000 5,000 _— steps 


Figure 9 - Parallelism profile for non-deterministic Doctor program 
(20 patients, 4 doctors) 


For example n = 4: 


hh; 09 O 0 Ul] Uj2 U3 W44 Ai. Aj2 aig ay4 
loi long O O P O ug Ugg Um4 _ | 821 822 ag3 ang 
131 132 133 0 O O uss us | | agi aga agg agy 
lai laa las laa 0 0 O Um a4, ago 43 aga 


Then, it is clear that: 
Aex=LeUex=b 


So, we can solve for x in two stages. In the forward substitution stage, 
we find y such that: 


Ley=b 
and then, in the backward substitution stage, we find x such that: 
Uexzy 


Each of the last two equations involves triangular matrices, and so 
the solution is easy. For forward substitution: 


_ Di 
Y= hy 


204 R. S. Nikhil and Arvind 


IA 
~. 
lA 
a 


w= iq [- Yat yl 2 


and, for backward substitution: 


_ Un 
pte Unn 
1 n : 
w= i [ye Dykes uy xj] l<i<n-l 


Let us consider how to decompose A into Land U. It is clear from 
the equation L« U=A that the (ij)-th element of Ais the inner- 
product of the i-th row of L and the j-th column of JU, i.e., 


ly uy + Lio ug; to lin Unj = ay 


However, since lyis zero whenever i<j and uy is zero whenever i > j, 
this equation can be separated into two cases (note the final term in 
each case): 


i<f: lia uy; + lia Ugj + ... li uy aij 


t> jf: lia uj + lig Ugy + ... ly uy ay 


Further, it is always possible to choose the diagonal elements of L {i.e., 
ly) to be 1. The last two equations can then be rearranged as: 


uy = ay - Ye lik Ukj l<jsn,lstsf 
1 = ; ; : 
ly = ay (au heh bee uy) lsjsn,j+lsign 


Since L’s diagonal elements (1) are assumed to be one, we do not 
compute them, and we do not store them. When this diagonal of Lis 
omitted, the remaining Land Uelements have disjoint indices. 
Therefore, they can be stored in a single n x n matrix called LU. This 
structure, along with the computation of the Land Uelements, is de- 


Id 205 


picted in Figure 10. The forward and backward substitution computa- 
tions are depicted in Figures 11 and 12, respectively. 


5.2. LU decomposition of dense matrices 


The Id code for LU decomposition of a dense matrix follows the 
equations given above exactly (Figure 10): 


def LUDCMP dense A = 


{ (_,_),(_,n) = matrix_bounds A ; 
LU = {matrix (1,n), (1,n) 
$ upper 
| {(i,j] =ufn ij jl j <- tons i <- 1 to j 
% lower 
| {4,3] = lfn ij lt j <- 1 ton & i <- (j+1) ton } 
% upper 


def ufn i j = sum_down 1 (i-1) Afi,j] (term i j) ; 
% lower 
def lfn i j = (1/LU[3j,3]) * 
(sum_down 1 (j-1) Afli,j] (term i j)) ; 
def term i j k = LU(i,k]*LU[k,j] ; 
In 


LU }; 


In the first line in the block, we find n, the dimension of the prob- 
lem. The primitive function matrix bounds returns the index bounds 
of A, represented as a pair of pairs. The pattern on the left-side of the 
binding shows this pair-of-pairs structure, and ignores three of the 
components (using “_” for a “dont-care” pattern) and binds n to the 
fourth. Here, we are assuming that a has bounds (1,n), (1,n)—it 
would be quite easy to bind all four components and verify that this is 
indeed so. 


206 R. S. Nikhil and Arvind 


Wyo a - SSS + ESS 


Yad ee | 


lye ae (¥ - ISG} - RYN) 


Figure 10 - Computation of LU from A 


> - RSY¥ CWS - ANY eeu) 


y = b (forward substitution) 


+. (> - CSSY ERS) - ESSI OSD) 


LLLLLLLLL A | 


Figure 12 - Solving U * x = y (backward substitution) 


Id 207 


The second binding defines the actual LU matrix, using Id’s array 
comprehension notation. It specifies the contents of the 2-dimen- 
sional array in two regions corresponding to U and L, respectively, 
using the functions ufn and lfn. Each expression of the form 
(sum_down k; kz a f) computes 


The Id code for which is: 


def sum_down k1 k2 a f = { for k <- kl to k2 do 
next a =a - (f k) 


finally a }; 


Here is the code for the forward and backward substitutions. Again, 


the code is self-evident, corresponding exactly to the equations and 
Figures 11 and 12. 


def LUBKSB_ dense LU B = 
{ (_,n) = bounds B ; 
Y = {vector (1,n) 
| {1] = Bll] 
| [i] = sum_down 1 (i-1) B[i] (yfn i) {| i <- 2 ton }; 
def yfn i j = LU[i,j] * Yj] ; 
X = {vector (1,n) 
| {n] = Y[n] / LU[n,n] 
{ [4] = (1/LU[i,i]) * (sum_down (i+1) n Y{[i] (xfn i)) 
{| i <- (n-1) downto 1}; 
def xfn i j = LU[i,j] * X{4] ; 
In 


X }i 


208 R. S. Nikhil and Arvind 


1 AU 


n 


Figure 13 - Data Structure for A (LU is similar) 


5.3. LU decomposition of skyline matrices 


In Figure 10, in each inner-product, note that the low index of each 
component vector is always 1. In a skyline matrix, on the other hand, 
the low index of the horizontal vector will be some il 2 1 and the low 
index of the vertical vector will be some j1 2 1. Thus, the inner prod- 
uct can be “clipped” to begin at max{(il, j1). 


A second important observation is that it is clear from the equa- 
tions that the LU matrix will always have exactly the same skyline 
shape as the original A matrix, so that the data structure for LU can be 
identical to the data structure for A. 


The data structures that we choose for A (and LU) is shown in 
Figure 13. The sub-diagonal elements of A are held in AL, which is an 
n-vector of vectors. The i-th row is represented by a vector with di- 
mensions (jl, i- 1), where 1 < jl <iis the minimum index. When 
jl = i, the lower index is greater than the lower index, representing 
an empty row vector. These are depicted by little circles “o” in the 
figure. The diagonal and super-diagonal elements of A are held in AU, 
which is also an n-vector of vectors. The j-th row is represented by a 
vector with dimensions (j1, /), where 1 <jl <j is the minimum index. 
Note that none of the column vectors can be empty. 


Id 209 


The code for the decomposition function is shown in Figure 14. In 
the definition of U, the j-th column vector is specified as a vector with 
the same bounds as the j-th column vector in AU, (from i1 to 4). In 
calling ufn, we supply it i1 and j1, the lowest indices of the column 
vector and row vector of the inner-product. In ufn, we clip the itera- 
tion to begin at max(il,j1). A similar strategy is used in the specifica- 
tion of L. 


The code for forward and backward substitution is shown in Figure 
15. Recall that in the dense version, yfn was defined thus: 


def yfn i j = LU[i,j]) * YCj] ; 


Now, however, the term LU[i,j] must be replaced by L[il] [3]. 
However, L[i] is fixed for each sum_down traversal, and so we optimize 
it by passing the entire row vector L[i] to yfn, which then just in- 
dexes it with 3. 


In xfn, we run across the following problem: the inner-product tra- 
verses a row of U. However, because of our representation of the sky- 
line U, not all elements in a particular row may be present. So, our 
code first extracts the j-th column vector, and then extracts the index 
bounds of that vector. If i is outside the bounds, the term is 0; other- 
wise, we extract the normal LU[i,3j] (ie., U[j] [i], which is Uj[i}). 


Of course, this conditional is executed O(n?) times (once for every 
position in the upper triangle). We could trade time for space by first 
reformatting Uinto a full upper triangular matrix (filling in all the ze- 
roes) and then indexing it as usual. We do not pursue this possibility 
here. 


Finally, the top-level function to solve a given set of equations is 
def solve_sky (A,B) = LUBKSB_sky (LUDCMP_sky A) B; 


where we assume that A is a pair of skylines (AL, AU}. 


210 R. S. Nikhil and Arvind 


def LUDCMP_sky (AL, AU) = 
{ (_,n) = bounds AU ; 
U = {vector (1,n) 
| [3] = { (i1,_) = bounds AU[}) 
In 
{vector (il,j) 
| [i] = { (31,_) = bounds AL{i] 
In 
ufn il 31 i 3} «6I] i <- i1 to 5}} 
1] 3 <- 1 ton }; 
L = {vector (2,n) 
| [i] = { (j1,_) = bounds AL[i] 
In 
{vector (j1,1i-1) 
| {3] = { (i1,_) = bounds AU[}j) 
In 
lfn il jl i 3} [1 4 <- 92 to (i-1)}} 


Il i <- 2 ton }; 
def ufn il j1 i j = 


sum_down (max il j1) (i-1) AU[j] [i] (term i j); 
def lfn il jl i j = 
(1/UCL31[3]) * (sum_down (max il j1) 
(j-1) AL[i] [3] (term i j)); 
def term i j k = L[il{k] * U[3][k] ; 
In 


| (j] = { L,U }; 


Figure 14 - LU decomposition for skyline matrices 


5.4. Test runs 


We generated a random 50 x 50 A matrix {i.e., picked a random en- 
velope and filled with random numbers) containing 1210 elements, 
i.e., a density of about 50%. We also generated a randomly-filled B vec- 


Id 211 


def LUBKSB_ sky (L,U) B = 
{ (_,n) = bounds B ; 


Y = { vector (1,n) 


| {1} = B[1) 
| (4) = { (31,_) = bounds L[i] 
In 


sum_down 31 (i-1) B[i] (yfn L[{i] 4i)} 
[| i <- 2 ton }; 
def yfn Li i j = Lilfj] * ¥[3) ; 
X = { vector (1,n) 
! (n) = Y{n} / U[n] [n] 
1 (4) = (1/Uf{i] [i]) * (sum_down (i+1) n Yi] (xfn i)) 
lf i <- (n-1) downto 1 }; 


def xfn ij = { uj = U[j]) ; 
(i1,_) = bounds uj ; 
In 


if (i < il) then 0.0 
else Uj(iJ * X[j] }; 
In 


xX}; 


Figure 15 - Forward and backward substitution for skyline matrices 


tor of size 50. Figure 16 shows the parallelism profile generated by 
GITA when solve_sky is run on these inputs. Figures 17 and 18 show 
the individual contributions of ludcmp_sky and lubksb_sky, respec- 
tively, to the composite parallelism profile. 


It is clear that almost all the parallelism in the skyline matrix solver 
is in the LU decomposition stage. This is actually quite obvious if we 
analyze Crout’s algorithm, which exhibits “wavefront” parallelism. 
Each element in the LU matrix depends only on those above it and to 
the left, so that a frontier that is perpendicular to the diagonal can be 


212 R. S. Nikhil and Arvind 


computed in parallel. This frontier sweeps across the matrix like a 
wave from the top left to the bottom right. 


In the forward and backward substitution stages, since the y and x 
matrices are filled using linear recurrences, there is hardly any paral- 
lelism at all. 


In languages with explicit parallelism, the wavefront parallelism of 
LU decomposition could be expressed with a little effort for dense ma- 
trices. We could have a sequential loop that iterates down the diagonal, 
and a nested, parallel loop that computes all the cross-diagonal ele- 
ments in parallel. Unfortunately, it does not appear so easy to extend 
such a solution to skyline matrices, because of the irregular shape of 
the cross-diagonals. In Id, on the other hand, implicit parallelism 
gives us automatic producer-consumer synchronization, allowing it to 
adapt dynamically to such irregular parallel structures. 


Id 213 


2.500 Crit. path length: 4,948 
: Total operations : 845,401 


1,000 2,000 3,000 4,000 5,000 step 


Figure 16 - Parallelism profile for solve_sky (50 x 50 example) 


2.500 Crit. path length: 984 
i Total operations : 728,205 


1,000 2,000 3,000 4,000 5,000 step 


Figure 17 - Parallelism profile for hidemp_sky (50 x 50 


2,500 Crit. path length: 4,534 
* Total operations : 117,143 


1,000 2,000 3,000 4,000 5,000 step 


Figure 18 - Parallelism profile for lubksb_sky (50 x 50 


214 R. S. Nikhil and Arvind 


Acknowledgements 


Steve Heller was mainly responsible for our solution to the paraffins 
problem, developed in 1988. 


The non-deterministic doctor's office program was developed with 
much discussion with Paul Barth, who is also responsible for the im- 
plementation of managers and performing our test run. 


Although our skyline matrix solver uses Crout’s method, we have 
learned a lot from previous work on linear equation solvers that used 
Gauss elimination. K. Ekanadham of IBM Research developed elegant 
Id programs for dense matrices, and Javed Aslam, Christopher Colby 
and Ken Steele developed Id programs for sparse matrices. 


This report describes research done at the Laboratory for Computer 
Science of the Massachusetts Institute of Technology. Funding for the 
Laboratory is provided in part by the Advanced Research Projects 
Agency of the Department of Defense under the Office of Naval Re- 
search contract NO0014-89-J-1988. 


Footnotes 


1. A first version of Id appeared in 1978. Since 1985, it underwent 
a series of revisions during which it was variously called Id/83s, Id86, 
Id Noveau, etc., finally reverting to just “Id” again. 


2. However, Id is in no way specific to dataflow machines. 
3. See [5] for an extensive discussion. 


4. We take this asymmetric view only because of the wording of the 
problem statement. If doctors went away for random periods between 
seeing patients, then we would have symmetry. We would model 
“consultations” as the resource that is demanded by two kinds of 
clients (doctors and patients). 


5. We have modeled the duration of the simulation as a fixed number 
of iterations, since the problem statement does not specify anything. 


Id 215 


Another possibility would be to consult a real-time clock (which 
would, of course, be implemented as a manager object.) 


References 


1. 


Arvind and J. Dean Brock. Resource Managers in Functional 
Programming. Journal of Parallel and Distributed Computing, 1(6), 
June 1984. 


Arvind, S. K. Heller, and R. S. Nikhil. Programming Generality and 
Parallel Computers. In Proc. of the Fourth International 
Symposium on Biological and Artificial Intelligence Systems, 
Trento, Italy, September 1988, pgs. 255-286. 


Hoare, C. A. R. Monitors: an Operating System Structuring 
Concept. Communications of the ACM, 17(10), October 1974, pgs. 
549-557. 


Knuth, D. E. The Art of Computer Programming, Volume 1: 
Fundamental Algorithms. Addison Wesley, Reading, MA, 1973. 


Knuth, D. E. The Art of Computer Programming, Volume 2: Semi- 
Numerical Algorithms. Addison Wesley, Reading, MA, 1973. 


Nikhil, R. S. Id (Version 88.1) Reference Manual. CSG Memo 284, 
MIT Laboratory for Computer Science, Cambridge, MA, August 
1988. 


Nikhil, R. S. and Arvind. Programming in Id: a _ parallel 
programming language. Book in preparation. 


Press, W. H. , et. al. Numerical Recipes: The Art of Scientific 
Computing. Cambridge University Press, Cambridge, London, 
1986, pgs. 31-35. 


Turner, D. A. The Semantic Elegance of Applicative Languages. In 
Proc. ACM Conference on Functional Programming Languages and 
Computer Architecture, Portsmouth, NH, October 1981, pgs. 85- 
92. 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
1992 Elsevier Science Publishers B.V. 217 


OCCAM 


Jean-Luc Gaudiot 
Department of Electrical Engineering-Systems 
University of Southern California 
Los Angeles, CA 90089-0782 


Dae-kyun Yoon 
Computer Science Department 
University of Southern California 
Los Angeles, CA 90089-0781 


L. Introduction 
1.1. Brief History 


The language Occam has been under development at INMOS Ltd. 
since 1982. It was based on the collaboration between David May, 
head of the Architecture Group at INMOS, and C. A. R. Hoare, well 
known for Communicating Sequential Processes (CSP) [1,3]. David 
May spent several years developing ideas for programming concurrent 
processes. He formulated many of his ideas in a language known as 
EPL (Experimental Programming Language). Since then, David May 
has developed those original ideas into the Occam as it is today. 


The first development of Occam, called proto-Occam or Occam 1, 
provided the main constructions and mechanisms which form the ba- 
sis of the language. Since the initial release of Occam in January 1983, 
the language has been developed into a refined form known as Occam 
2. In Occam 2, strong typing and the mechanisms of abbreviation have 
been introduced in addition to the refinement in the area of commu- 
nication protocol. Multi-dimensional arrays and side-effect free func- 
tions have been also implemented [3,4,6]. 


218 J. L. Gaudiot and D. K. Yoon 


The Occam programming language is a high-level language de- 
signed for implementing concurrent programs on a network of pro- 
cessors. The language is intended to be implemented on any general 
purpose computer, but is particularly efficient on the transputer. The 
design of the INMOS transputer has been closely related to the devel- 
opment of Occam. Thus, the transputer can be considered an “Occam 
machine” [2]. 


1.2. Concurrency 


A sequential program specifies sequential execution of a list of 
statements, and its execution is called a process. A concurrent pro- 
gram specifies two or more sequential programs that may be executed 
concurrently as parallel processes. Consider the following example. 
(See Figure 1) 


Process 1: 
do forever... 
produce data 
send data to process 2 


ao some other stuff 


Process 2: 
do forever... 
do something 
receive data from process 1 


consume data and do some other stuff 


The above example is found in many applications composed of sev- 
eral processes interacting each other. Each process is composed of 
statements which are executed sequentially, while both process 1 and 
process 2 are executed in parallel. The notations for sequential and 
parallel processes in Occam, are SEQ and PAR, respectively. Thus, 
above example can be written as follows using Occam constructions. 


Occam 219 


Figure 1 - Two communicating processes 


PAR 
WHILE TRUE 
SEQ 
produce data 
send data to process 2 
do some other stuff 
WHILE TRUE 
SEQ 
do something 
receive data from process 1 


consume data and do some other stuff 


The process is the basic entity of an Occam program. Each state- 
ment (or line) is viewed as a process, and can be combined into a 
larger process by using constructions. In the above example, the two 
loops are independent processes which are executed concurrently. 
They only interact with one another at some point by exchanging data. 


1.3. Communication and Channel 


The mechanism of interaction between concurrent processes is an 
essential part of concurrent programming. Interaction between pro- 
cesses is accomplished by communication primitives in many pro- 
gramming languages. Occam provides paths between processes, called 
channels, and two primitive processes, input and output. Input and 
output processes are denoted by ? and !, respectively, and operate 
only on channels. 


220 J. L. Gaudiot and D. K. Yoon 


A channel is visible as a global object among interacting processes. 
In other words, two processes communicate with each other by shar- 
ing the same channel. Using channels, our previous example could be 
rewritten as follows: 


CHAN OF ANY chan: 
PAR 
WHILE TRUE 
SEQ 
produce data 
chan ! data 
do some other stuff 
WHILE TRUE 
SEQ 
do something 
chan ? data 


consume data and do some other stuff 


The well-known rendezvous mechanism is applied to implement 
channel communication in Occam. Communication over a channel can 
only occur when both input and output processes are ready. If during 
execution of a program, an input statement is reached before its cor- 
responding output statement is reached, the input process waits until 
the output process is ready; likewise, if the output statement is 
reached first, the output process waits for the input process. This 
simplifies many problems caused by race conditions which are often 
found in concurrent programming. 


1.4. Occam Program Structure 


An Occam program is a collection of processes. Each process can 
be viewed as a collection of one or more processes which form a block 
structure. Contrary to many other block-structured languages, Occam 
recognizes block by indentation level. Therefore, in Occam, indenta- 


Occam 221 


tion is not only for readability, but also for program structure. 
Variables can be defined locally or globally according to the scope rule. 
Consider the following Occam program which evaluates the expres- 
sion, 


(a- b) *(aec+a/c) 
in parallel. 


-- Occam program which performs (a-b) * (a*c + a/c) in parallel 
CHAN OF INT inl,in2,in3,out: 
CHAN OF INT chi,ch2,ch3,ch4: 
INT a,b,c: 
SEQ 
~~ Get input for a,b andc 
PAR 
inl 2a 
in2 2? b 
in3 2? c 
-- Perform every arithmetic in parallel 
PAR 
chl !'|a-b 
ch2 !ax*e 
ch3 ta/e 
-- evaluate (a*c + a/c) 
INT t1,t2: -- note that t1,t2 are local var. 
SEQ 
PAR 
eh2 ? tl 


ch4 ! t1 + t2 
-- evaluate (a-b) * (a*c + a/c) and output result 
INT t1,t2: 
SEQ 


222 J. L. Gaudiot and D. K. Yoon 


PAR 
chil ? tl 
ch4 ? t2 


out ! tl * t2 -- output result 


In this example, all the arithmétic operations are done in parallel. 
No explicit control of sequence among operators is imposed. How- 
ever, careful examination of the channel communication shows that 
some parts of the program will be executed sequentially. For example, 
channels chi, ch2, ch3, and ch4 have both a producer and consumer. 
The consuming process must wait until the producer is ready to out- 
put. 


Variables are defined immediately before the process in which they 
are used and remain effective during the execution of the process (or 
block). In our example, variables a, b, and c are global throughout the 
execution of the program, while variables t1, and t2 are local to the 
process immediately following their declarations. 


As in other languages, comments are permitted in Occam. -- is 
recognized as a start of comment. But notice that the comment line 
should be properly indented, when it starts with a new line. 


2. Processes 


2.1. Primitive Processes 


Three types of primitive processes, assignment, input/output, and 
SkIP/sTOP, form the basis for Occam operations. 


An assignment process changes the value of a variable, just as it 
does in most conventional languages. The notation for the assignment 
is 


<variable> := <expression> 


Occam 223 


Multiple assignment is also possible, as illustrated in the following ex- 
ample. 


a, by Cc := x, y * Z X¥~- Z 
Operators which can be used in the expression are shown as follows. 
e +,-,*, /,\: Number operators, where \ is a remainder 
operation. 


© /\,\/, ><, ~, <<, >> : Logical and binary operators, 
which denote respectively ~ and, or, exclusive-or, not, 
left-shift, and right-shift. 


e =, <, >, <=, >=, <>: Relational operators. 
Left and right parentheses can be used in the expression. 


Input and output processes (?, !) can be used for communication 
purposes. These work only on channels. The notation for input and 
output is as follows. 


channel ? variable 


channel ! expression 


The general form of input and output process will be discussed later in 
this chapter. 


There are special purpose processes which are used mainly for 
controlling the execution flow. 


e stop: Once started, it never terminates. So, if used in 
the SEQ block, the processes following STOP are never 
executed. 


¢ SKIP: Equivalent to no operation. 


224 J. L. Gaudiot and D. K. Yoon 


2.2. Constructions 


Larger processes are built by combining smaller processes using 
constructions. The following types of constructions are possible in 
Occam. 


SEQ: sequential 
PAR: parallel 
IF: conditional 
WHILE : loop 
ALT : alternation 
CASE : selection 
A sequential construction combines processes which are to be ex- 
ecuted in sequence. We have already seen examples of this construc- 


tion. A sequential construction may include other constructions, 
forming a hierarchical structure. 


The parallel construction is one of the most important construc- 
tions in the Occam language. It groups a number of processes which 
are to be executed in parallel. 


Each process in a conditional construction is guarded by a Boolean 
expression. The conditional evaluates each Boolean expression in se- 
quence. If an expression is found to be true, the processes following 
the expression are executed, and the conditional construction termi- 
nates. If none of the expressions are found to be true, the conditional 
construction behaves like stop. Consider the following example: 


IF 


Occam 225 


If x2 y, the conditional stops. To prevent termination we can modify 
the conditional as follows 


IF 


or 


TRUE 
SKIP 


The program segments above behave like an if-then-else-endif state- 
ment in conventional language. Conditionals may be nested to imple- 
ment more complex control flow. 


A loop repeats a process until an associated Boolean expression 
returns false. For example, 


SEQ 


WHILE i < number.of.inputs 
SEQ 
keyboard ? ch 
screen ! ch 


irsid¢i1 


Alternation provides a choice of process(es) depending on the state 
of communication channels. This construction is similar to the con- 
ditional construction, which provides a choice of processes depending 
on the value of a Boolean expression. Alternation checks all the input 


226 J. L. Gaudiot and D. K. Yoon 


channels and executes the process associated with the first channel to 
respond. For example, 


CHAN OF INT chanl, chan2, chan3 
INT x: 
ALT 
chanl ? x 
process 1 
chan2 ? x 
process 2 
chan3 ? x 


process 3 


If only chan1 responds, then only process 1 is executed. If both 
chanl and chan2 respond, then the process associated with the first 
channel to respond is executed. The “guards” may include a Boolean 
expression. In which case, the associated process is executed only if 
the channel is the first to respond and the Boolean expression is true. 
This is shown in the following example. 


CHAN OF INT chanl, chan2, chan3 : 
INT x: 
ALT 
(y < 0) & chanl ? x 
process 1 
(y = 0) & chan2 ? x 
process 2 
(y < 0) & chan3 ? x 


process 3 


A selection (CASE) selects from a number of options by matching 
the value of a selector with the values of set of constant expressions. 
(Same as the switch() statement in C). 


Occam 227 


CASE ch 

Ta! 

screen ! ‘A! 
tat 

screen ! 'A' 
ht 

screen ! 'B! 
"Bt 

screen ! 'B!! 


We can group together those constant expressions for which the as- 
sociated process are exactly the same. 


CASE ch 
hat "AY 
screen ! 'A' 
*pt; ‘pr 
screen ! 'B' 


2.3. Replicators 


In some cases, we may have a number of processes which behave 
the same as or similar to each other. Consider the following example: 


INT x: 
SEQ 
input ? x 


input ? x 


) 
* 


input 
input ? x 


input ? x 


The equivalent form written using replicator is as follows. 


228 J. L. Gaudiot and D. K. Yoon 


INT x: 
SEQ i= 0 FOR 5 


input ? x 


Five replicas of the input processes are created, and executed in se- 
quence. The replicator always includes one of the following: SEQ, PAR, 
ALT, IF. 


The general form of a replicator is: 


REP index = base FOR count 


process 


where, REP is one of SEQ, PAR, ALT, or IF. index is used in conjunc- 
tion with an array, and its value ranges from base to base + count - 1. 
Each instance of an index value is associated with the corresponding 
replica or process. Thus, count replicas are created. 


As an example, the basic client-server model can be implemented 
as follows. 


{number.of.clients] CHAN OF INT request: 
[number.of.clients] CHAN OF BOOL acknowledge: 
PAR 
-- server process 
INT service: 
WHILE TRUE 
SEQ 
ALT i = 0 FOR number.of.clients 
request [i] ? service 
SEQ 
perform(service, i) 
acknowledge[i] ! TRUE 


Other book-keeping stuff... 


Occam 229 


-- Create replicas of client process 
PAR i = 0 FOR number.of.clients 
BOOL ack: 
INT service: 
WHILE TRUE 
SEQ 
determine service 
request [i] ! service 
acknowledge[i}] ? ack 


do other stuff 


In this example we have defined an array of channels, and assigned 
each channel to a corresponding client process. Thus communication 
between the server and each client is carried out through an exclusive 
channel, 


3. Types and Variables 
3.1. Variables 


The declaration of a variable defines ‘the data type and name of the 
variable. A name comprises alphabetic characters and digits starting 
with a character. Occam distinguishes between upper and lower case. 
A period (.) can be used to separate each word within a variable. 
Reserved words cannot be used as variable names. For example, these 
are valid variable names: 


x, Y, intvar, chan3, new.prod, old.prod. 
The declaration of variables has the following form: 
<type> <variable list> : 


Variables should be declared prior to the block in which they are used, 
and can be effective throughout the execution of the block unless they 
are redeclared. 


230 J. L. Gaudiot and D. K. Yoon 


INT a,b,c: 
SEQ 
pl.. 
BOOL c: 
PAR 
p2.. 
p3.. 


In the above example, a, b, and c are used as integer variables in pi 
and p3. However, in p2, c is a Boolean variable. Thus, Occam follows 
the general scope rule found in most structured languages. 


3.2. Basic Types 


Occam provides the following basic types. 
BOOL, BYTE, INT, INT16, INT32, INT64, REAL32, REAL64 


INT16, INT32, INT64, REAL32, REAL64 denote the type and the size 
of the internal representation explicitly. 


Literal data can be used for each type. For example: 


¢ 42: an integer literal in decimal. 

e #2A: an integer literal in hexadecimal. 

e 'x': a byte literal. 

¢ "letters": a string literal. 

¢ TRUE, FALSE : Boolean literals. 
Literal data can be explicitly typed. This is often used to convert the 
type of literal. 

¢ 42(BYTE) : a byte value. 


e 'T' (INT) : an integer value. 


Occam 231 


* 42.0(REAL64) : a 64 bit floating point value. 


3.3. Arrays 


An array is a group of objects of the same type, joined into a single 
named object. In Occam, array variables are declared in the same 
manner as scalar variables of any type, except that the size of the array 
is prefixed to the type specifier in brackets. An example of an array 
declaration is: 


{10] INT int.array: 


Arrays may be passed as parameters to procedures. In Occam, we 
are not required to specify the size of an array parameter in the 
declaration of the procedure; thus, arrays of any size, but of the 
correct data type, may be passed as actual parameters. For example: 


PROC procl (CHAN OF INT input, []INT array) 
A segment of an array is expressed as: 
{ array FROM subscript FOR count Jj 


Array segments may be input, output, or assigned to in a statement as 
long as the left- and right-hand side have the same type. For example, 


{iant.array FROM 10 FOR 5) := [int.array FROM 1 FOR 5] 
{int .array FROM 10 FOR 5] := other.array 


In the above example, the declaration for other.array should be 
[5] INT. 


Generating an array value in Occam is very easy. For example: 
(10, 21, x+4, ytz] 


generates an array of type [4] INT assuming x and y are integer vari- 
ables. 


232 J. L. Gaudiot and D. K. Yoon 


Array generation and array segment are often used in conjunction 
with abbreviation. For example: 


f IS [int.array FROM 1 FOR 5] VAL q IS [1, 3, 5, 7, 9] 
If VAL is preceded, the associated variable cannot be modified. 


Occam supports arrays with any number of dimensions and they are 
declared in a way consistent with what we have seen so far. For ex- 
ample a two dimensional array can be defined as: 


[20] [20]REAL32 real.matrix: 


Each element or each row can be referred to as a single variable. For 
example, real.matrix[i][j] refers to the j-th element in the i-th 
row, and real.matrix{i] refers to the whole i-th row. 


In Occam, string is treated as an array of type BYTE. For example, 
"hello" is equivalent to ["h','e','1','1','o'). We cover the input 
and output of strings in the next section. 


4, Channel Communications 


The underlying principle of Occam programming is communication 
between processes, and channels are the primary means by which this 
communication takes place. Occam provides a protocol which defines 
a type or group of types of data over a channel. The channel 
declaration has the following form: 


CHAN OF protocol name: 


4.1. Simple Protocol 


The simplest kind of protocol consists of basic types. For example: 


CHAN OF INT comm: 
CHAN OF [20] INT chan: 


Occam 233 


In the above example, chan always carries exactly 20 elements of type 
InT. However, we can send the size of the array over the channel. For 


example, the following process outputs two integer arrays of different 
sizes on the same channel. 


CHAN OF INT::[]BYTE comm : 
{20] BYTE a: 
([40]BYTE b: 
SEQ 
comm ! 20::a 


comm ! 40::b 


A corresponding input process is as follows: 


INT size: 
[1000] BYTE string: 
SEQ 


comm ? size::string 


comm ? size::string 


The above example also illustrates how strings can be input and output 
over the same channel. 


4.2. Naming Protocol 


A protocol can be named and used in numerous channel declara- 
tions. This is often useful if the protocol over a channel is complex. 
For example a sequential protocol can be defined with ;, as in: 


PROTOCOL Message IS BYTE; INT; INT: 


CHAN OF Message comm: 


234 J. L. Gaudiot and D. K. Yoon 


The channel comm receives or sends a BYTE type followed by two INT 
type values. The output process contains a list of expressions with 
correct types, such as 


comm ! 10(BYTE); 321; 833 
The corresponding input process is 
comm ? x; yi z 


A single channel can carry messages of several different formats by 
using a variant protocol. A variant protocol is a set of different proto- 
cols each of which is associated with a tag. The following definition 
defines a variant protocol named Messages: 


Protocol Messages 
CASE 
a;INT; INT 
b;BYTE:: [])BYTE :; 


a and b are tags in this example. An example using this protocol is 


CHAN OF Messages comm: 
PAR 
INT x, y: 
BYTE size : 
[256]BYTE v: 
comm ? CASE 


ar xy 


b; size::v 


Other processes may send messages such as 


comm ! a; 100; 250 


Occam 235 


or 


comm ! b;5::"Hello World !!" 


Tags can be used as a guard in ALT construction, and we can define a 
protocol which consists only of tags. This kind of protocol is often 
used to input or output different signals over the same channel. 


5 Procedures and Functions 


As in many other conventional languages, we can achieve a certain 
level of program abstraction by defining procedures or functions. 


5.1. Procedure 


A procedure definition associates a name with a process. The 
procedure call in the program is replaced with the procedure’s body. 
Consider the following procedure: 


PROC procl() 
SEQ 
num := num + 1 


count := count - 1 


Wherever the text string procl() appears in the program text, it is 
replaced by the statements, 


SEQ 
num := num + 1 
count := count - 1 


The procedure definition may include formal parameters which are 
expanded to the abbreviation. For example, consider the following 
definition 


236 J. L. Gaudiot and D. K. Yoon 


PROC write.string(CHAN OF BYTE c, VAL []BYTE s) 
SEQ i= 0 FOR SIZE s 


ec ! s[i] 


Then the call 


write.string (output, "Hello World !!") 


would expand to: 


CHAN OF BYTE c IS output 
{]BYTE s IS "Hello World [!!": 
SEQ i= 0 FOR SIZE s 


ec ! s[ij 


SIZE is a predefined process which returns the number of elements 
in the array. 


Since the procedure call is implemented by in-line expansion, re- 
cursive procedure calls are not allowed. 


5.2. Functions 


A function is a special kind of process, called a value process. A 
value process returns a result of basic data type (i-e., not an array type). 
For example, consider the following unnamed function definition 


big := (INT max: 
VALOF 


max :=b 


RESULT max 


Occam 237 


If a value processes is named, then the name can be used in other 
parts of the program to refer to the process. For example, we can de- 
fine the function max as 


INT FUNCTION max(VAL INT a, b) 
INT max: 
VALOF 
IF 


TRUE 
max :=b 


RESULT max : 


Now the function call: 
big := max(a,b) 


assigns the maximum value of a and b to big. 


6 Configuration 
6.1. Allocation of Multiple Processors 


The component processes of a PAR construction may be executed 
on different processors. This can be specified by a special annotation, 
placed parallel, which assigns a process to the specified processor. 
Consider the following example: 


PLACED PAR 
PROCESSOR 1 

procl () 
PROCESSOR 2 


proc2 () 


Here, the processes procl and proc2 are placed on the processors 
numbered 1 and 2. The annotation can also be replicated, 


238 J. L. Gaudiot and D. K. Yoon 


PLACED PAR 
PLACED PAR i = 0 FOR n 
PROCESSOR 2*i 
evenproc () 
PLACED PAR i = 0 FOR n 
PROCESSOR 2*i+1 
oddproc () 


6.2. Hard Channels 


When the hardware is attached tightly to the program, we may 
need direct access to the hardware. This is also true when we have 
multiple transputers and need to communicate between transputers. 
This is the same as external communications. External communica- 
tions is done through the links between transputers. These are often 
implemented by memory mapped I/O, and are numbered as the ad- 
dresses of the memory location that map directly to the external 
hardware. Consider the following example: 


PLACE screen AT 1: 
PLACE keyboard AT 2: 


Now, each channel, screen and keyboard maps to an output (link 1) 
and an input device (link 2), respectively. 


In addition to hard channels, Occam can declare I/O ports as used 
in conventional computer systems. An example of a port declaration is 


PORT OF BYTE serial: 


6.3. Priority 


On a single processor, we may assign relative priorities to each 
process in a PAR construction. Consider the example, 


Occam 239 


PRI PAR 
procl () 
proca () 


procn () 


Here, proc; has priority over proc; if i<j. Therefore, procjcan be ex- 
ecuted only when proc, ... proc;.; are not ready to execute. PRI PAR 
can also be replicated. 


In addition, the input processes in an alternation construction may 
also have priorities. Consider the following example: 


PRI ALT 
chanl ? data 
rocl () 
chan2 ? data 


proc2 () 


TRUE & SKIP 
default .proc () 


In this example, proc, is selected only if it is ready to respond and 
chan, ... chanp_1 are not ready to input. If none of the inputs are ready, 
default .proc is selected as a default process. As in ordinary ALT, PRI 
ALT can have conditions in conjunction with the input process. 


6.4. Timer 


To provide a real-time programming environment, Occam supports 
a special type called TIMER. A simple example would look like: 


TIMER clock: 
INT time: 


clock ? time 


240 J. L. Gaudiot and D. K. Yoon 


This simple timer process reads input from a real-time clock, actual 
“ticks” of the clock. You can declare as many timers as you want 
which is sometimes very useful, but the timers may have different val- 
ues. 


As an practical, but simple example, we can define a delay proce- 
dure as follows, 


PROC delay (VAL INT interval) 
TIMER clock: 
INT timenow: 
SEQ 
clock ? timenow 


clock ? AFTER timenow PLUS interval 


The process c ? AFTER b is a delay process which, like SKIP, does 
nothing (but, the process may be suspended), and terminates only 
when the reading of c satisfies the condition AFTER b. a AFTER b 
can be read as a condition, just like (a MINUS b) > 0. Thus, we can 
interpret the last line of the above program as “do nothing until the 
current time is after the starting time plus interval.” 


A timeout can be implemented as follows, 


TIMER clock: 
VAL timeout IS 1000: 
INT timenow: 
SEQ 
clock ? timenow 
INT x : 
ALT 
input ? x 
..some process... 
clock ? AFTER timenow PLUS timeout 


screen ! (18:: "Timeout on input !") 


Occam 241 


When writing programs using timer, it is essential to be aware of 
the details of time representation. The test is only meaningful if the 
difference in the two times is small enough compared to the largest 
value represented by an integer. 


7. Examples 


The complete code for each problem appears in the Appendix. The 
reader is advised to read a section and then study the Occam code 
listed in the Appendix. 


7.1. Hamming’s Problem, Extended 


Given a set of primes, a, b, c, ... , generate the sequence of integers 
of the form 


aise ble cke ... en 


In Occam, the sizes of all structures must be known at compile 
time, except for the formal parameters of procedures. So we are 
forced to assume a fixed number of prime numbers instead of an infi- 
nite (or unknown) number of primes. Hamming numbers are gener- 
ated by computing output streams recursively. Since Occam does not 
allow recursive invocation, we are forced to use special techniques, 
such as pipelining processes or manipulating stacks [7], to implement 
recursion. In our solution we have chosen a feedback channel. To 
avoid deadlock, the numbers are queued as they are fed back and a 
number is issued by the queue manager only on request. The program 
structure is shown in Figure 2. 


7.2. Paraffins Problem 


This problem asks us to generate all paraffin molecules of size i< n, 
where iis the number of carbon atoms. This is equivalent to generat- 


242 J. L. Gaudiot and D. K. Yoon 


Grant new number 


Request new number 


Feedback 


Figure 2 - Program structure for Hamming's Problem, Extended 


ing all trees of size n without duplicates. Enumerating trees is dis- 
cussed in [5]. For simplicity, we denote the paraffin structure as a list 
of carbons. In the example program, we represent each paraffin as 
strings of (,), and c. For example, the paraffin CogHe is denoted by 
(C(C)). 


Our solution of the paraffins problem uses a fixed-size three-di- 
mensional array. However, since a dynamic array structure is very 
useful for this problem, we have introduced a delimiter ($) to denote 
non-existing elements. Handling dynamic structures in Occam is gen- 
erally not trivial, and requires extra lines of code. In our example pro- 
gram, we use only the SEQ construct because the current implemen- 
tation of Occam does not permit variable number of parallel processes 
and does not permit the indexing of global arrays in the body of paral- 
lel components unless the value of the index is known at compile 
time. If we carefully designed a communication network of processes 
(assuming the number of processes can be known at compile time), 


Occam 243 


Receptionist 


Time out 
—- <a 
——____» 

Terminate 


Release patien Assign patient 


Av eeeseosccconersees 


Figure 3 - Interaction between processes in the 
Doctor's Office Problem 


we could exploit some limited parallelism. However, we would pay a 
large communication cost. 


7.3. Doctor’s Office 


This example demonstrates how interactions between asyn- 
chronous processes can be simulated in Occam. The example pro- 
gram also illustrates real time programming using Timer. A major 
portion of the program handles termination of concurrent processes. 
The process responsible for the conclusion of the simulation is t ime- 
out. When the time-out event occurs, termination signals propagate to 
all processes involved in the simulation. The interaction between pro- 
cesses is shown in Figure 3. 


244 J. L. Gaudiot and D. K. Yoon 


7.4. Skyline Matrix Solver 


In this problem, we solve a system of linear equations by computing 
with only non-zero elements of a skyline matrix, given two profile vec- 
tors for skyline of the matrix. As in the paraffins problem, we require 
dynamic array structure. However, as we stated earlier, Occam re- 
quires the sizes of all array structures at compile time, thus we can not 
store only the non-zero elements of the skyline matrix. To compen- 
sate, we introduced error values to represent the zero elements. 
Then by ignoring the error values during computation, we eliminate 
the zero elements from consideration. Although this problem has a 
certain level of parallelism, we did not use any parallel constructs for 
simplicity. Using parallel constructs requires too many things to be 
known at compile time, leading to a solution which is not general. 


References 


1. Hoare, C. S. Communicating sequential processes. Communi- 
cations of the ACM, 21(8), August 1978. 


2. Inmos. Transputer Development System 2.0. Inmos Ltd., 1987. 
3. Inmos. Occam 2 Reference Manual. Prentice Hall, 1988. 


4, Jones, G. and M. Goldsmith. Programming in Occam 2, Prentice 
Hall, 1988. 


5. Knuth, D. The Art of Computer Programming, Vol. 1: Fundamental 
Algorithms. Addison-Wesley, 1987. 


6. Poutain, R. and D. May. A Tutorial Introduction to Occam 
Programming - including a language definition. Inmos Ltd., 1987. 


7. Redfern, S. Implementing data structures and recursion in occam. 
Inmos Ltd. Technical Report 38, Central Applications Group, 
Inmos Ltd., 1988. 


Occam 245 


Appendix 


KKKKKKKKKKKEKEKKKKKIKKKKEKEKKEKKEKEKKKKK KK 


**k*k* Hamming's Problem, Extended **** 
KKK KKKKEKKKKAKKKKKKEKKKKKKKAKAKKKKKKAKKKE 


VAL N IS 1000: 

VAL Nprime IS 10: 
[Nprime] INT Primes: 
[N+1L] INT Hamming: 


PROC HammingNumbers () 
CHAN OF INT Feedback, RequestNum, GrantNum, Output: 
(Nprime]CHAN OF INT ToMult, FromMult: 


-- Collector : 

-- Proc collector gathers new numbers and eliminates duplicates. 
-- This proc sends a new hamming number back to “QueueManager" in 
-- order to generate a new number from this number. 


PROC Collector () 


VAL delayunit IS 100: 
VAL nomoreinput IS 20: 
INT num, delayent: 
BOOL continue: 
SEQ 
SEQ i= 0 FOR N 
Hamming[i] := -1 
Feedback ! 1 (INT) -- Send initial value 
continue := TRUE 
delaycnt := 0 
WHILE continue 
PRI ALT 
-- An input from one of Multipliers 
ALT j=0 FOR Nprime 
FromMult[j}] ? num 


SEQ 
delaycnt := 0 
IF 
Hamming[num] = -1 -- No duplicate 
SEQ 
Hamming[num] := num 
Feedback ! num 
TRUE -~- duplicate 
SKIP 
TRUE & SKIP 
SEQ 


Delay (delayunit) 
delaycnt := delaycnt + 1 


246 J. L. Gaudiot and D. K. Yoon 


IF 
delaycnt > nomoreinput -~ End of computation 
continue := FALSE 
TRUE 
SKIP 


-- loop has been completed : output hamming numbers. 
INT outnum: 
SEQ i= 1 FORN 
SEQ 
outnum := Hamming[i] 
IF 
outnum <> -1 
Output ! outnum 
TRUE 
SKIP 


Feedback ! -1 ~- Send a termination signal 


-- END of PROC Collector 


-- Multiplier 

-- Multiplies given number by one of prime numbers in the 

-- table "Primes[]". 

-- If the result is less than N the pass the result to the collector. 


PROC Multiplier(CHAN OF INT in, CHAN OF INT out, VAL INT prime) 
INT num: 
BOOL continue: 
SEQ 
continue := TURE 
WHILE continue 
SEQ 
in ? num 
IF 
num < 0 
continue := FALSE 
TRUE 
SEQ 
num := num * prime 


num > N 
SKIP 
TRUE 
out ! num -- Send to the collector 


-- Splitter 
-- Get a new Hamming number from the feedback queve, and distribute to 


Occam 247 


-- the multipliers. 


PROC Splitter () 


INT num: 
BOOL continue: 
SEQ 


continue := TRUE 
WHILE continue 
SEQ 
RequestNum ! 1 (INT) 
GrantNum ? num 
IF 
num < 0 
continue := FALSE 
TRUE 
SKIP 
PAR i = 0 FOR Nprime 
ToMult [i] ! num 


-- QueveManager 
-- THis process is introduced to remove deadlock problem in the cycle of 
-~ computation. “QueueManager" holds newly generated Hamming numbers 


-- in a queve, and upon request of "Splitter", it hands over the new 
-- number to "Splitter" 


PROC QueueManager ({) 
INT num, ndata, front, rear, req: 
BOOL continue: 


[N+2] INT queue: -- queve is actually unbound, here. 
SEQ 
ndata,front, rear := 0,0,0 
continue := TRUE 
WHILE continue 
ALT 
(ndata > 0) & RequestNum ? req 
SEQ 
num := queue[front] 
front := front + 1 


ndata := ndata - 1 
GrandNum ! num 
IF 
num < 0 
continue := FALSE 
TRUE 
SKIP 
Feedback ? num 
SEQ 
queue [rear] num 
rear := rear + 1 


248 J. L. Gaudiot and D. K. Yoon 


ndata := ndata + 1 


-- Main body of hamming numbers 
SEQ 
~~ Init Primes[], Table of prime numbers: 
-- We can directly assign array values or generate the “Nprime' prime 
-- numbers. 
Primes := [2,3,......+- ] 
PAR 

Splitter () 

Collector () 

QueueManager () 

PAR i=0 FOR Nprime 

Multiplier (ToMult (i), FromMult [i}],Primes [i] ) 


KAKKKKAKKKKEKKKEKKKAKKKKAKKKKKK 


x*kk*k Paraffins Problem **** 
KHEKKKKKKKKEKKKEKKKKAKKKKKK KKK 


VAL Maxnodes IS 10: 
VAL Maxtrees IS 50: 
VAL Maxsize IS Maxnodes*3+1: 


PROC Paraffins(VAL INT size, []{]{]BYTE ParaffinTree) 


-~ TwodimCat: 
~~ Catenate two dimensional arrays. 


PROC TwodimCat (VAL []{]BYTE tl, VAL [](JBYTE t2, [][]BYTE t) 
INT i1,42: 
SEQ 
i1,i2 := 0,0 
WHILE t1[i1] [0] <> '$! 


SEQ 
t[il] := t1{il)} 
idl os= il +1 
WHILE t2[i2}[0] <> '$' 
SEQ 
t[il] := t2[1i2] 
il, 12 := i141, i2+1 
t[iil) [0] := '$' 
-- OnedimCat 


-- Catenate strings (array of bytes) 


PROC OnedimCat (VAL []BYTE t1, VAL t2 []BYTE t2, 


INT i1,i2: 
SEQ 
i1,i2 := 0,0 
WHILE ti[il} <> ‘S$’ 


SEQ 
t{il] := t1[il] 
il :=il+1 
WHILE t2[i2] <> '$! 
SEQ 
t[{il} := t2[i2] 
il, i2 := i141, i241 
t[il] := 's! 
-- TwodimArrayAdjust 


Occam 


[] BYTE t) 


-- Adjust lower and upper bound of two dimensional array. 


PROC TwodimArrayAdjust (VAL (][]BYTE a, VAL INT 1, VAL INT u, 


[J []BYTE al) 


INT i: 
SEQ 
i:=0 
WHILE i < 1 
SEQ 
al[i})(0] := '"$' 
io:= itl 
WHILE i <= u 
SEQ 
al[i] := a[i] 
is=idtil 
WHILE i < Maxtrees 
SEQ 
al[i}[0] := '$! 
i:= itil 
-- Xtrees 


-- Returns the cross product of set of trees of sizes i,j,k and l 


-- without duplicates. 


PROC Xtrees(VAL INT i, VAL INT j, 
VAL [][](] BYTE trees, 
[Maxtrees] [Maxsize]BYTE crossl, 


[Maxtrees] [Maxsize]BYTE tmp: 
[Maxsize]BYTE tmp2: 
INT i1,i2,m,n,0: 


i.e. Returns array with elements within this boundary. 


VAL INT k, VAL INT 1, 


[] (] BYTE ntree) 


eross2: 


249 


250 J. L. Gaudiot and D. K. Yoon 


SEQ 
IF 
i= j 
SEQ 
n := 0 
WHILE trees[j][n][0] = '$' 
n:=n+i1 
il =0 
WHILE trees[j][n][0] <> 'S! 
SEQ 
TwodimArrayAdjust (trees[j], 1, n, tmp) 
m:= 0 
WHILE tmp[m] [0] = 'S* 
m:=meti 
WHILE tmp[mj[0] <> ‘S$! 
SEQ 
OnedimCat (trees[j] {n],tmp[m],cross1{il]) 
m:=m+i1 
nee=neti 
ilo:= i141 
cross1[i1][0] := '$' 
TRUE 
SEQ 
n :=0 
WHILE trees[j][n][0] = '$' 
ns=net+il1 
il = 0 
WHILE trees[j][n][0] <> '$! 
SEQ 
m := 0 
WHILE trees[i][m][0] = '$! 
ms=mt+ti 
WHILE trees[{i][m][0] <> '$' 
SEQ 
OnedimCat (trees[j] (n],trees[i] [m],crossl[il]) 
m:i=m+i 
ne:i=n¢eti1 
il := il +1 
erossl[{il}[{0} := '$' 


n:= 0 
WHILE trees[k][n][0] = '$' 
ns:i=ne#til1 
il=0 
WHILE trees[k]{n][0] <> '$' 
SEQ 
TwodimArrayAdjust (trees(k], 1, n, tmp) 
m:= 0 
WHILE tmp[m] [0] = 'S$' 


Occam 251 


m:=m+t+i 
WHILE tmp(m] [0] <> 'S' 
SEQ 
OnedimCat (trees {k] [n),tmp[m],cross2[i1]) 
m:=mt+i1 


idl s= il +1 
cross2[i1][0] := '$' 
TRUE 
SEQ 
n:= 0 
WHILE trees(k][{n])(0] = 'S' 
ns=net+i1 
il =0 
WHILE trees[(k]}[n][0] <> '$' 
SEQ 
m:= 0 
WHILE trees[1}[m] [0] = 'S! 
m:s=mt+il 
WHILE trees[1][m] [0] <> '$' 
SEQ 
OnedimCat (trees [k] [n],trees[1] (m],cross2[il1]) 
m:=mt+i 
ns=ne¢el 
il :=i1l+1 


cross2{i1][0] := '$! 
IF 
k = 3 
SEQ 
n := 0 
WHILE crossl(n][0] = '$! 
ns=nt+il1 
il =0 
WHILE crossl[n][0] <> '$' 
SEQ 
TwodimArrayAdjust (cross2, 1, n, tmp) 
m:= 0 
WHILE tmp[m] [0] = '$' 
m:=mt+i1 
WHILE tmp{m] [0] <> '$' 
SEQ 
OnedimCat (crossl{n],tmp{m],tmp2) 
ntree[il][0],ntree[il] [1] := '(','c' 
12 := 2 
o:= 0 
WHILE tmp2[o] <> 'S! 
SEQ 
ntree{il][i2] := tmp2[o] 
12, 0 := i241, o+1 
ntree{il}[i2] := ')' 


ntree[il] (i2+1] := '$! 


252 J. L. Gaudiot and D. K. Yoon 


m i 
ni=n+t 
il := il 
ntree[{ilj [0 
TRUE 
SEQ 
n := 0 
WHILE crossi[{n][0] = '$' 
n:=n¢#¢i1 
il =0 
WHILE crossl[n}][0] <> '$' 
SEQ 
m := 0 
WHILE cross2[m][0] = '$! 
m:=mt+il 
WHILE cross2[m][0] <> '$! 
SEQ 
OnedimCat (crossl1{(n],cross2[m],tmp2) 
ntree[{il][(0],ntree[il][1] := '(','c! 


Seed 
vi eS 


i2 :=2 
o := 0 
WHILE tmp2[o] <> 'S$' 
SEQ 
ntree[il])[i2] := tmp2[o] 
i2, 0 := i2+1, ot1 


ntree[il][i2] := ')' 
ntree[{il] [i2+1] := 'S$! 
m:i=mt+il 
n:s=n-¢+i1 
il := il +1 
ntree[il] [0] := '$' 
-- end of Xtrees() 


-- OrderedTrees :; 
-- Returns the ordered trees of size 1 to 'size' without duplicates. 
-- Notice that we must keep at least one link free 
PROC OrderedTrees(VAL INT size, [][]{] BYTE tree) 
INT i: 
SEQ 
i := 2 
SEQ o=0 FOR Maxnodes 
SEQ p=0 FOR Maxtrees 
SEQ q=0 FOR Maxsize 


treel[o][pl{q] := '$' -- empty tree 
[tree[1][1] FROM 0 FOR 3] := "(C)" 
[tree[2][1] FROM 0 FOR 6] := "(C(C))" 
WHILE i < size 


SEQ 


Occam 253 


SEQ i=0 FOR (1-1) /3+1 
SEQ l=k FOR (i-1-k)/2 - K +1 
Xtrees(0, k, 1, i-l-k-1, tree, tree[i]) 
-- end of OrderedTrees () 


-- OneCentroid : 
-- Returns the one centroid free trees of size n+l. Note that the 
-- largest subtree can be at most w. 


PROC OneCentroid(VAL INT n, VAL INT w, VAL [][][] BYTE tree, 
(] {] BYTE ntree) 


INT i1,12: 
{[Maxtrees] [Maxsize]tmp: 
SEQ 
il :=0 
SEQ i= 0 FOR n/4 +1 
SEQ j= i, (n-i)/3 +1 
SEQ 
i2 := max(j,w-i-j) 
SEQ k:=i2 FOR (n-i-j)/2 +1 
SEQ 
Xtrees(i, j, k, n-i-j-k, tree, tmp) 
i2 := 0 
WHILE tmp[i2][0] <> '$! 
SEQ 
ntree[il] := ntree[i2] 


11,12 := i141, i241 
. -- end of OneCentroid() 


~~ TwoCentroid : 
-- Returns the two-centroid free trees 
PROC TwoCentroid(VAL [] [JBYTE tree, [] []BYTE ntree) 
INT asize: 
INT high: 
INT il: 
SEQ 
asize := 1 
WHILE tree[asizet+1][0] <> '$! 
asize := asize + 1 
il := 0 
SEQ i=l FOR asize 
SEQ j=i FOR asize-jtl 
SEQ 
high := 0 
WHILE tree[i] [high] <> '$' 
high := high + 1 
tree[i] [high-1] := '$' 
OnedimCat (tree[i}],tree[j],ntree[il]) 


254 J. L. Gaudiot and D. K. Yoon 


high := 0 
WHILE ntree[il] [high] <> '$' 
high := high + 1 
ntree[{il] [high] := ')' 
il := il +1 
-- end of TwoCentroid() 


-- Main body of Paraffins() 

[Maxnodes] [Maxtrees] [Maxsize]BYTE HalfTrees: 
SEQ 

OrderedTrees (size / 2, HalfTrees) 
[Maxtrees] [Maxsize] BYTE treel,tree2: 

SEQ i=l FOR size 


IF 
i\2e1 
OneCentroid(i-1, i/2, HalfTrees, Paraffins[i}) 
TRUE 
SEQ 


TwoCentroid(HalfTrees[i/2], treel) 
OneCentroid({i-1, i/2, HalfTrees, tree2) 
TwodimCat (treel,tree2,Paraffins [i] 


KEKKKKKKKKEKKEKKKEKKKKKK KE KKKEKKKKKKK 


*k*kk Doctor's Office Problem **** 
KEKE KKKKKKKKKKKKKKKKKKKKKKKKKEKK 


VAL NumPat IS 50: 

VAL NumDoc IS 5: 

VAL MaxList 100: 

(MaxList]INT SickOrder, CuredOrder: 

{(MaxList] [2] PatientDoctor: 

INT SickOrderSize, CuredOrderSize, PatientDoctorSize: 
INT Interval, SimTime: 


-- Doctor's Office 
PROC DoctorOffice ( 
{]INT SickList, [][]AssignedList, []JINT CuredList, 
INT SickSize, INT AssignedSize, INT CuredSize, VAL INT Howlong) 
{NubPat] CHAN OF INT RecToPat, PatToRec: 
[NumDoc] CHAN OF INT RecToDoc, DocToRec: 
CHAN OF BOOL ToTimeout, FromTimeout: 


-- delay(n) : delay for n ticks 
PROC delay (val INT duration) 
TIMER clock : 

INT timenow: 


Occam 255 


clock ? timenow 
clock ? AFTER timenow PLUS duration 


-- Doctor process 
PROC Doctor (CHAN OF INT from, 
CHAN OF INT to, AL INT maxdu, VAL INT mindu) 
BOOL continue, idle: 
INT pat, duration, range, num: 
SEQ 
continue, idle := TRUE, TRUE 
WHILE continue 


PRI ALT 
from ? pat 
IF 
pat < 0 -- Terminate signal 
continue := FALSE 
TRUE 
SEQ 


range := maxdu - mindu 
random(range, num) 
duration := num + mindu 
idle := false 
(NOT idle) & SKIP 
SEQ 
delay (Interval) 
duration := duration - Interval 
IF 
duration <= 0 
SEQ 
to ! patient -- assigned patient has been cured 
idle := true 


-- patient prcess 


PROC Patient (CHAN OF INT from, CHAN OF INT to, VAL INT sickprob) 
BOOL continue, sick: 
INT pat,num: 


SEQ 
continue, sick := TRUE, FALSE 

WHILE continue 
PRI ALT 

from ? pat 
IF 

pat < 0 -- termination signal 
continue := FALSE 


TRUE 


256 J. L. Gaudiot and D. K. Yoon 


sick := FALSE -- returned from doctor's office 
(NOT sick) & SKIP 
SEQ 
random (100, num) 
IF 
num < sickprob ~~ YES !! I am sick now.. 
SEQ 
sick := TRUE 
to !1 -- Go to doctor's office 
TRUE 
delay (Interval) ~- delay a little bit 


-- Receptionist process 


PROC Receptionist () 
BOOL continue, timeout: 


VAL pqsize IS NumPat: -- patient queue size 
VAL dqsize IS NumDoc: -- doctor queue size 
[pqsize] INT pqueue: -- patient queue 
[dqsize] INT dqueue: -- doctor queue 
INT pf,pr,df,dr: -- pointers of the queue 
INT npq,ndq: ~~ Number of elements in queue 
INT pat,doc: 
INT vetrydelay: -- delay used to receive pending inputs 
SEQ 
retrydelay := 1000 
SickSize, AssignedSize, CuredSize := 0, 0, 0 
pf, pr := 0, 0 -- front and rear of patient queue 
npq := 0 
PAR i = 1 FOR NumDoc -- Initialize doctor queue 
dqueue [i-1] :=i 


af, dr := 0, 0 
ndq := NumDoc 


continue := TRUE 
WHILE continue 
PRI ALT 
FromTimeout ? timeout -- time out signal 
continue := FALSE 
ALT -- input from doctors or patients 
ALT i = 0 FOR NumPat-1 -- input from patients 
PatToRec[i] ? pat 
SEQ 
pat := itl -- patient number 
IF 
SickSize >= MaxList -- reached maximum list size 
SEQ 
ToTimeout ! TRUE -- terminate TimeoutProc 
continue := FALSE 


TRUE 


Occam 


257 


SEQ 
SickList [SickSize] := pat 
SickSize := SickSize + 1 
pqueue(pr] := pat -- add to patient queue 
npq, pr := npqtl, (prt+1) REM pqsize 
IF 
ndq > 0 -~ There is an available doctor 
SEQ 
doc := dqueue[df] -- delete from doctor queue 
ndq, df := ndq-1, (df+1) REM dqsize 
pat := pqueue[pf] -- delete from patient queue 
npq, pf := npq-l, (pf+1) REM pqsize 
-AssignedList [AssignedSize] [0] := pat 
AssignedList [AssignedSize] (1] := doc 
AssignedSize := AssignedSize + 1 
RecToDoc[doc-1] ! pat -- assign patient 
TRUE 
SKIP 


ALT i = 0 FOR NumDoc-1 
DocToRec[i] ? pat 
SEQ 
CuredList [CuredSize] 
CuredSize CuredSize + 
dqueue [dr] itl 
ndq, dr ndgti, 
RecToPat [pat-1] ! 
IF 
npq > 0 
SEQ 

pat := pqueue [pf] 
npq, pf := npq-i, 
doc := dqueue [df] 
ndq, df ndq-1, 


--a 


(dr+1) 
pat -- 


AssignedList [AssignedSize] [0] 
AssignedList {[AssignedSize] [1] 
AssignedSize + 1 


AssignedSize 
RecToDoc [doc-1] 
TRUE 
SKIP 
TRUE & SKIP 


SKIP 


-- input from doctors 


patient is released from doctor 


:= pat 


1 

add to doctor queue 

REM dqsize 

a patient is released from office 


there is a waiting patient 


-- delete from patient queue 
(pf+1) REM pqsize 

-- delete from doctor queue 
(adf+1) REM dqsize 
:= pat 
:= doc 


pat -~ assign patient 


-- no input 


-- Terminate all the doctors and patients 


continue := TRUE 
WHILE continue 
PRI ALT 
ALT i 0 FOR NumDoc-1 
DocToRec[i] ? doc 
SKIP 
ALT i 0 FOR NumPat-1 
PatToRec[i] ? pat 


-- read pending inputs 


258 J. L. Gaudiot and D. K. Yoon 


SKIP 
TRUE & SKIP 
delay (Interval) 
retrydelay := retrydelay - Interval 
IF 
retrydelay < 0 
continue := false 
TRUE 
skip 


-- Send termination signal to patients and doctors 
PAR i = 0 FOR NumDoc-1 


RecToDoc[i] ! -1 
PAR i = 0 FOR NumPat-1 
RecToPat[i} ! -1 


-- Generate timeout signal after given interval 
PROC TimeoutProc(VAL INT t) 
BOOL continue, term: 
INT timeout: 
SEQ 
continue := TRUE 
timeout := t 
WHILE continue 
PRI ALT 
ToTimeout ? term 
continue := FALSE 
TRUE & SKIP 


IF 
timeout < 0 
SEQ 
continue := FALSE 
FromTimeout ! TRUE 
TRUE 
SEQ 


delay (Interval) 
timeout := timeout - Interval 


-- Main Body of DoctorOffice 
PAR 
Receptionist () 
PAR i = 0 FOR NumDoc-1 
Doctor (RecToDoc [i], DocToRec [i], 10000, 1000) 
PAR i = 0 FOR NumPat-1 
Patient (RecToPat [i], PatToRec[{i],5)} -- sick probability 5% 
TimeoutProc (Howlong) 


Occam 259 


Kae KKKKKKKKKKKK KK KK AK KKK 


xkxk*k Skyline Matrix **** 
FIT OI III IO I I Ok 


VAL error IS 999999.99 
VAL N IS 20 


-- Skyline : 

-- Take an integer, two prifile vectors, a two-dimensional array and 
-- a column vector as inputs. Returns a set of solutions of linear 
-- equations. 


PROC Skyline(VAL INT n, VAL []INT pi, VAL [JINR pj, 
VAL [] [JREAL32 ain, VAL []REAL32 bin, []REAL32 sol) 


-- FormSkyline 

-- Given two profile vectors and n x n matrix, returns the nonzero 
-- elements of the lower and upper triangles of A in two 

-~ two-dimensional arrays. Note that upper triangle is reflected 
-- about the diagonal. 


PROC FormSkyline(VAL INT n, VAL []INT I, VAL []JINT J, VAL [][]REAL32 A, 
{] []REAL32 L, [(][]REAL32 U) 
SEQ 
SEQ k=1 FOR N 
SEQ 1l=1 FOR N 
L[k] [1], U[k]{1] := error, error -- fill error values 
SEQ k=1 FOR n 
SEQ 1=I[k] FOR k-I[k]+1 
L[k] [1] := Ak] [1] 
SEQ k=1 FOR n 
SEQ l=J[k] FOR k-J[k]+1 
U[k] [1] := A[1] [k] 


-- ZeroFill : 
-- Append zeros to the front of 'a' if the lower limit of 'b' is less 
-- than the lower limit of b. 


ZeroFill( VAL []REAL32 a, VAL [{]REAL32 b, []REAL32 c) 
INT la,lb: 
SEQ 
la ,lb := 1, 1 
WHILE a(la] <> error 
la := lat+il 
la := la-il 
WHILE b[1b] <> error 
lb := Ilb+1 


260 J. L. Gaudiot and D. K. Yoon 


lb := Ib-1 


c:i=a 
IF 
lb < la 
SEQ i:=lb FOR la-lb 
c(i] := 0.0 
TRUE 
SKIP 
-- Elim. : 
-- Take jth row of L and the matrix of deltas, and returns 
-~ row - (jth row of delta). 


ElimL(VAL INT 3, []REAL32 row, VAL []{]REAL32 delta, []REAL32 res) 
INT k: 


SEQ 
SEQ i:=0 FOR N 
res[i] := error -- initially assign error vaules 
k = 1 
WHILE row[k] = error 
k =k +1 
WHILE row[k] <> error 
SEQ 
IF 
delta{jj(k] = error 
res[(k] := row[k] 
TRUE 
res(k]) := row[k] - delta[j] [k] 
k :=k +1 
-- ElimU : 


-- Take jth row of U and the matrix of deltas, and returns 
-- row - (jth column of delta). 


ElimU(VAL INT j, []REAL32 row, VAL [][]REAL32 delta, []REAL32 res) 
INT k: 


SEQ 
SEQ i:=0 FOR N 
res[i] := error -- initially assign error vaules 
k r= 1 
WHILE row[{k] = error 
k sek ei 
WHILE row[k] <> error 
SEQ 
IF 


delta[(k]{j] = error 
res({k] := row[k] 


Occam 261 


TRUE 
res[k] := row[k] - delta[k] [3] 
k s=k +1 
-- Eliminate 


-- Take as input an integer, a lower and upper triangular array, and 
-- a column vector. Eliminate the upper triangular array, and return 
-- the adjusted lower triangular array and column vector. Note again 
-- that the row's of U are the columns of A above the diagonals. 


PROC Eliminate (VAL INT n, VAL [][]REAL32 Lin, VAL [][(]REAL32 Uin, 
VAL []REAL32 bin, [](]REAL32 Lout, []REAL32 bout) 
INT i,i1,i2: 
[IN] [N] REAL32 L,U,delta: 
{N] REAL32 b, Lrow, Urow: 
SEQ 
isc=n 
L, U, b := Lin, Uin, bin 
SEQ o=0 FOR N 
SEQ p=0 FOR N 
delta[o]{p] := error 
WHILE i > 1 
SEQ 
il :=1 
WHILE U[i} {il} = error 
ilo:= il +1 
WHILE U({i}(il+1] <> error 
SEQ 
i2 :s=1 
WHILE L[i][i2] = error 
12 := 12 +1 
WHILE L{ij][i2+1] <> error 
delta[il][i2] := x * y / L{il] [il 
SEQ j=l FOR i 
SEQ 
FOR p=0 FOR N 
Lrow, Urow := error, error 
ZeroFill(L{j}], Li], Lrow) 
ZeroFI11(U[j], Uli], Urow) 
ElimL (j, Lrow, delta, L{j]) 
ElimU (4, Urow, delta, U[3j]) 
SEQ j=1 FOR i 
IF 
U[i]{4] <> error 
b[j] = b[3] - bli] * Uli] [3] / LEA] {il 
TRUE 
SKIP 
Lout{i] := L{il 
bout [i] := bli] 


262 J. L. Gaudiot and D. K. Yoon 


isi-i1 
. -- end of Eliminate 


-- Main Body of skyline() 

(N] (N]REAL32 Lin, Uin, L: 

(N]REAL32 b: 

INT row, col: 

REAL32 x: 

SEQ 
FormSkyline(n, pi, pj, ain, Lin, Uin) 
Eliminate(n, Lin, Uin, bin, L, b) 
row, col :=n, 0 
x := b(n} / L{n] [1] 
WHILE row > 1 


SEQ 
row = row - 1 
col := col +i 
SEQ i=1 FOR row 
IF 


L{i) [col] <> error 
b[i] := b[i] - x * L[{i]{col] 
TRUE 
SKIP 
x := b[row] / L[row] [col+1] 
sol[col] := x 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 263 


Program Composition Notation 


K. Mani Chandy 
Stephen Taylor 
California Institute of Technology 
Pasadena, CA 91125 


L Introduction 


Program Composition Notation (PCN) is a notation for composing 
programs. The programs that are composed may be expressed in base 
languages (such as Fortran, Lisp, or Strand) or in PCN itself. The PCN 
research effort has a narrow focus; our goals are described next. 


Program Composition Operators: The traditional method of construct- 
ing programs is by sequential composition. We are evaluating the the- 
sis that other forms of program composition are helpful, and that pro- 
grammers should be able to define their own composition operators 
for program composition. A goal of our effort is to identify proof rules 
for the composition operators, and to evaluate their efficacy in reason- 
ing about programs. 


Concurrent Computer Implementations: We seek to develop programs 
that execute efficiently on multicomputers (i.e., distributed memory 
machines), shared memory multiprocessors, and SIMD machines. A 
project on implementing PCN on multicomputers is underway at 
Caltech. A variant of this notation has been developed and is being 
implemented on the Connection Machine at UCLA by Rajive Bagrodia. 


Methodology: We wish to propose a unifying framework for developing 
numeric, symbolic and reactive programs. Our intent is to develop 
programs by stepwise refinement, starting with programs with sim- 
pler proofs and refining them if they are not adequately efficient. A 


264 K. M. Chandy and S. Taylor 


goal is to use existing libraries of programs in different base languages 
in program development. 


PCN is an outgrowth of UNITY [1] and Strand [3]. PCN is influenced 
by Hoare’s pioneering work on composition [4], though our goals are 
more limited than Hoare’s; our effort has the three goals listed above, 
whereas Hoare is concerned with a formal notation for describing all 
(discrete) systems. 


We have taken liberties with the notation by using symbols that are 
not on standard computer keyboards. Our intent is to describe pro- 
gram composition operators rather than to describe a specific lan- 


guage. 


2. Data Types 


PCN has the basic data types: boolean, integer, single and double 
precision floating point number, character, and string. The initial 
value of a variable of a basic data type is arbitrary. 


2.1. The Synchronization Type 


PCN has a data type called synch, for synchronization. The initial 
value of a synch variable is a special symbol, 9, that indicates that the 
variable is undefined. Programmers have the obligation of proving that 
a synch variable is assigned at most one value. Given such a proof, a 
synch variable is either undefined (6), or it is defined (by its non-o 
value) and its definition remains unchanged. A synch variable can be 
assigned a value of any type. 


A variable that is not declared is assumed to be a synch type; the 
scope of the variable is the block in which it is named. 


Program Composition Notation 265 


2.2. The Tuple Data Type 


PCN has a data type called tuple which is a sequence of items be- 
tween braces, {. . .}. The elements of a tuple can be of arbitrary data 
types including tuples themselves. The i-th element of a tuple x is de- 
noted by x{i]. The time taken to access the i-th element of a tuple is 
independent of i A tuple is identical to a one-dimensional array, 
except that the elements of a tuple need not all be of the same type. 
The function sizeof applied to a tuple returns its size. A tuple can be 
defined by enumeration, as in (5, 3, 4}, or by quantification, as 
described next. 


2.3. Quantification 


A sequence of items can be defined using quantification. A quanti- 
fied-form has the syntax: 


« identifier in integer-expression .. integer-expresion :: lexical-unit » 


The quantified-form, «i in n..m :: exp», is a sequence in which 
the lexical unit exp appears once for each instance of iwhere ns is m, 
and in the k-th appearance of exp, all instances of iin exp are re- 
placed by k. 


Examples - 


® «i in 0..2 :: x[i] + y[i]» is the sequence 
x[0] + y[0], xf1) + y(1], x[2] + y[2]. 


e «i in 0..2 :: 5» is the sequence 5,5,5. 
e i ain 0..2 :: xfi] := yli], uli] := v[i]» 
is the sequence of statements: 
x(0] := y[0], u[O0] := v[0], 
x[1] := y[1], ull] := vl], 
x[2] := y[2], ul[2] := v[2]. 


266 K. M. Chandy and S. Taylor 
A tuple can be defined by a quantified-form between braces; for ex- 
ample: {«i ain [0..2] :: x{i]>>} is the tuple {x[0],x{[1],x{(2]}. 


Quantified forms can be embedded one in another. The notation 
«i,j in n..m :: exp» is equivalent to 


«i inon..m:: «j in n..m :: exp»». 

And the notation «i in ni..mi, j in nj..mj :: exp» is equivalent to 
«i in ni..mi :: «j in nj..mj :: exp»». 

Guarded Expressions: A guarded expression has the syntax, 


guard > expression 


where guard is a Boolean expression. The value of a guarded expres- 
sion, g > e, is defined as follows. If g holds, then the value of g > eis 
the value of e; if - g, then the value of g — e is empty, where the sym- 
bol empty is the identity element in concatenation (of sequen- 
ces). For example 


{C i,j in 0..1 :: (i $j) DO 1li, 3), (i > j) Doula, 3] >} 
is the tuple 
{ 10,0], 10,1], uf1,0], 1[1,1] }. 


Dyadic Operators on Quantified Expressions: A reduced-form has the 
syntax: 


dyadic-operator over quantified-form 


and the syntactic unit in the quantified form must be a (guarded or 
unguarded) Boolean, arithmetic, or string expression. If the quantified 
form is the empty sequence, the reduced-form is the identity element 
of the dyadic operator, otherwise it is: 


t{O] V ¢{1] V...V t{k) 


Program Composition Notation 267 


where the sequence specified by the quantified form is ¢[0]... ¢[k], 
and V is the dyadic operator. 


Examples - 
° Se A[i] = + over “i in n..m:: A[i]» . 
e J[f., aAlil = * over «i in n..m :: Ali)» 
° v" Ali] = v over «i in n..m:: A[i]» 


i=n 
© + over « i,j in 0..2 
i> j 7uli,jl, i < j 11,3] » = 10,1] + 
1[0,2] + u[1,0] + 1{1,2] + u[2,0] + ul[2,1]). 


2.4. Arrays, Records and Sets 


Complex types can be created from other types by means of arrays, 
and records (structures). 


The notation supports the data types of sets of basic types and sets 
of records of basic types (but not sets of synch types). Sets of tuples 
are used for relations. Set operations are membership, insertion, 
deletion, difference, intersection, and union. 


PCN also allows quantifications over sets. Quantifications over sets 
have the same syntax as quantification over sequences, except that the 
range n..mis replaced by a set identifier. The quantified forms, in this 
case, represent sets. The dyadic operators on quantified forms that 
represent sets must be associative and commutative. A set can be 
enumerated or defined using quantified forms, in the same way as tu- 
ples, except that different braces are used to enclose a sequence. Sets 
are implemented by hashing. Sets are not discussed further in this 
paper. 


268 K. M. Chandy and S. Taylor 


3 Composition 


Programs are constructed by composing blocks. The operators 
used in composing programs are either primitive operators or user- 
defined operators. In this paper we restrict attention to four primitive 
program composition operators: sequential (;), parallel (1), choice 
(2), and fair (~) composition. 


Parameters are passed by reference. A program has a declaration 
and a block as in C. A block may have a declaration of local variables, 
as in C. A block has a body; its syntax is described next in BNF. All 
nonterminal symbols are in italics, and all terminal symbols are in 
plain or boldface type. The notation <su>, where su is a syntactic unit 
represents a list of zero or more instances of the syntactic unit where 
multiple instances are separated by commas. 


block :: assignment | 
procedure-call | 
guard — block | 
{primitive-composition-operator <block>} | 
{user-defined-composition-operator <argument>} 


3.1. Assignment 


Expressions: The expressions on the right hand side of an assignment 
are as in an imperative notation except that an expression may be a 
tuple, or an expression may include terms that are reduced expres- 
sions. The elements of a tuple on the right-hand side of an assign- 
ment can be constants or variables, but not expressions. 


An assignment x :=e is executed as follows. Consider two cases: 


Case 1 Variable x is of type synch and e is a variable or a tuple. 


Case 2 Variable x is an ordinary variable or e is an expression other 
than a variable or a tuple. 


Program Composition Notation 269 


Case 1: The value of x becomes the expression, e', where e' is obtain- 
ed from e by replacing all instances of non-synch variables in e by their 
current values. In all subsequent evaluations of expressions or pattern 
matches that name x, all instances of x are replaced by the expression 
e', For example, the execution of the assignment, x := {y,z}, where 
x and y are synch variables, and z is an ordinary variable with value 0, 
defines x aS {y,0}. 


Case 2: The value of e is determined, as described later; if the value of 
e is not $, x gets the value of e, otherwise the assignment is executed 


again. 


(Informally speaking, the assignment is executed as follows: wait 
until the value of e is non-6; x gets the non-9 value of e.) 


The value of an expression, e, is determined as follows: If e names a 
variable with value 6, the value of eis >. If e does not name a variable 
with value 6, then the value of e is equal to the value of an expression 
e’, where e’ is obtained from e as follows: Replace the values of all or- 
dinary variables in e by their (current) values and replace all synch 
variables in e by their definitions. The value of e is determined by de- 
termining the value of e’. The evaluation of an expression terminates if 
the value of the expression is $ or if the expression names only con- 
stants. 


The evaluation of an expression will not terminate if circular defi- 
nitions are used. An example of a circular definition is: x is defined as 
y, and y is defined as x, where x and y are synch variables. 

3.2. Guarded Blocks 
The meaning of a block which has the syntax 


guard — block 


is given next. The syntax of a guard is: 


270 K. M. Chandy and S. Taylor 


guard :: predicate | 
match-list | 
match-list, predicate | 
match-list ::<match> 
match :: ‘{<pattern-identifier>} <1 variable-identifier 


A guard evaluates to one of three values: suspends, success, or failure. 
A guard is evaluated as follows. The match list (if any) in a guard is 
evaluated before the predicate part. The match list of the guard is 
discussed next. 


Pattern Matches: A match is a syntactic convenience to refer to ele- 
ments of tuples. The pattern-identifiers in a pattern are not variables. 
Pattern-identifiers need not be declared in the program heading. All 
pattern-identifiers in a pattern match list must be distinct. 


The pattern-match list is evaluated from left to right. Let the next 
pattern-match evaluated in the evaluation of a guard, g, be: 


{ty, .. th} ax 


where t; is a pattern-identifier, for all i Evaluation of a guard contin- 
ues as follows: 


* If x evaluates to 6 then g evaluates to suspends, and 
evaluation of g terminates. 


e If x evaluates to a k-tuple then all instances of the pat- 
tern identifier, t, to the right of the pattern match in 
the statement (i.e., in the guard and in the block) are 
replaced by x(i], and the evaluation of g continues with 
the next pattern match, if any, in the pattern match 
list; if all pattern matches in the guard have been evalu- 
ated, guard evaluation continues with the evaluation of 
the predicate: if the predicate evaluates to 9, then the 
guard evaluates to suspends; if the predicate evaluates 


Program Composition Notation 271 


to true then the guard evaluates to success; if the 
predicate evaluates to false, then the guard evaluates to 
failure. 

¢ Otherwise (i.e., if x evaluates to a value that is not 6 and 
not a k-tuple) g evaluates to failure, and evaluation of g 
terminates. 


The execution of a block b with syntax g — cis as follows. If g eval- 
uates to: 


e success then execute c (block b terminates when c 
does) 
¢ failure then skip (the execution of b terminates) 


* @ then execute b (execute block b again) 


Informally, the execution of block b is: wait until g has a non-¢ value; if 
g has value success then execute c else skip. 


The meaning of an unguarded block is the same as that of a guarded 
block with a guard that always evaluates to success. Therefore, the 
proof rules for block true — c, and for block c, are identical. 


3.3. Sequential Composition 


The execution of a block {; <block>} is as follows: the blocks com- 
posed by sequential composition are executed in sequence. A sequen- 
tial composition block terminates when the last block in its block-list, 
< block>, terminates. 


3.4. Parallel Composition 


The execution of the block, {|| <block>}, is as follows: all blocks 
composed by parallel composition are executed in parallel. A parallel 


272 K. M. Chandy and S. Taylor 


composition block terminates when all blocks that it composes termi- 
nate. 


Blocks that are executed in parallel may read and write common 
variables. Reading or writing one item of a basic data type is an atomic 
action. 


An interleaving semantics is used to reason about parallel composi- 
tion. The following progress condition is guaranteed: at all points in 
the computation, for each block composed by parallel composition, 
the block has terminated or a statement in the block will be executed 
eventually. 


3.5. Choice Composition 


The execution of the block {? <block>} is described in this section. 
Restrict attention to choice blocks of the form: {?; <guard — block>}. 


(An unguarded block is treat as a guarded block with a true guard.) 
The execution of a choice block is as follows: 


1. If at least one guard in the choice block evaluates to 
success then execute any block with a guard that 
evaluates to success. 


2. If all guards in the choice block evaluate to failure then 
skip. 


3. Otherwise {i.e., if no guard in the choice block evaluates 
to success, and at least one guard evaluates to suspends) 
then repeat execution of the choice block. 


The execution of a choice block terminates if the evaluation of a 
guard succeeds and the block corresponding to the guard terminates, 
or if evaluation of all guards in the block list fail. 


Program Composition Notation 273 


3.6. Fair Composition 


Fair composition is identical to choice composition except that if 
control flows from a fair composition block to itself an infinite number 
of times, then each of the guards in the list of blocks composed by fair 
composition is evaluated infinitely often. 


Consider the following programs that are identical except that the 
first is defined by choice composition, and the second by fair compo- 
sition. 


P(x) P(x) 

integer: x; integer: x; 

{? {~ 

true — {; q(x), p(x) } true > {; q(x), p(x)} 
true — {; r(x), p(x)} true > {; r(x), p(x)} 


} } 
Program p executes either gq or r and then recursively calls itself. 


The choice composition program allows more computations than 
the fair composition program. In the fair composition program both q 
and r are executed infinitely often; thus, it is always the case that q 
will be executed eventually, and r will be executed eventually. In the 
choice composition program, q is executed infinitely often or r is exe- 
cuted infinitely often. Therefore, the choice composition also allows 
computations in which either q or r is never executed after some 
point in the program. Note, that the only reason that we employ both 
choice and fair composition operators is that a program implemented 
by using choice composition is more efficient than one using fair com- 
position. Also, for many programs, choice composition suffices. 


The meaning of control flow from one block to another is defined 
in terms of composition operators in the Appendix. Control flow has 
the obvious intuitive meaning from imperative programming: control 
flows from a block to the next one in sequential composition, from a 


274 K. M. Chandy and S. Taylor 


block to all those spawned by parallel composition, and from a choice 
or fair composition block to the one that is selected for execution. 


4, Examples 


In this section a few programs are developed to illustrate the com- 
position operators. Program text is in the courier font, while docu- 
mentation within programs is in the bookman font. We begin with a 
simple program flip that has two integer arguments, and flip inter- 
changes the values of its arguments. The body of the program is a se- 
quential composition block. 


flip (u,v) 

integer: u, v; Declare parameters of program. 

{7 Begin sequential composition block. 

integer: w; Declare local variables of block. 
wi =u, u:i=v, vi=w Block-list 


} End sequential composition block and end 
program. 


Next consider a program f with three arguments, an integer n, a 
one-dimensional array x indexed iwhere 0 < i<n, and an index j 
where 0 <j <n. The program flips x[j-1] and x[3j] if they are not in 
ascending order. 


£(n,4,x) 
integer: n, j; 
array [0 .. n-1] of integer: x; 


(x[3-1] > x[3]) > flip(x(j-1],x[j]) Single statement program. 


Next we write programs, s0, si, s2, each with two arguments: n 
and x where x is an array indexed [0 .. n-1], and where the postcondi- 
tion of each program is that x is in ascending order. For purposes of 
exposition assume that the elements of x are distinct. 


Program Composition Notation 275 


4.1. Simple Sort 


Choice Composition: As in UNITY, the simplest sorting routine is to 
flip any pair of elements of x that are out of order, and then repeat the 
sort procedure. 


s0(n,x) 
integer: n; 
array [0 .. n-1] of integer: x; 
{? « i an l..n-1:: 
(x{i-1] > x{i]l) 7 {(; flip(x(i-1),x(i]), s0(n,x)} » 


} 


Sequential Composition: The bubble sort is defined in the obvious way 
using sequential composition. 


sl1(n,x) 

integer: n; 

array [0 .. n-1] of integer: x; 

{; « t adn 1 .. n-1, i inl .. n-t :: £(n,i,x) »} 


Parallel Composition: The odd-even transposition sort is defined using 
sequential and parallel composition. On every odd step, for all odd i, 
x[i-1] and x[i] are flipped if they are out of order, and on even steps 
the same is done for even i. 


s2(n,x) 


integer: n; 


array [0 .. n-1] of integer: x; 
{; « t in O .. nee: 
{]] « iam 2 .. nis: ((t-i) mod 2 = 0) ~f(n,i,x) »} 


» 


} 


Next we present the odd-even transposition sort program using 
synch variables. The parameters of the program are n and two arrays, 
xand y, indexed i, where 0 < i<n. The program does not modify n or 


276 K. M. Chandy and S. Taylor 


x. The postcondition of the program is that y is the sort of x. The 
program uses a local array z indexed t, i where 0 < t < n+2. 


Specification: The specification of the program is the following set of 
equations: 

¢ ViwhereO <i<n-1:: 2[0, J = x{ij 

e Viwhere O<isn-1:: y[i = 2[n+l1, J 

¢ VitwhereO <isn-1, and whereO<t<n:: 

z[t+1, i] = min(z{t, j, z[t, 41]}) if @- t} mod 2 = 0, 
max(z[t, +1], z[t, q) if (t- 0 mod 2 #0 

Program: The program is a syntactic transformation of the specifica- 
tion: 

s3(n,x,y) 


integer: n; 


array [0 .. n-1] of integer: x, y; 
{|| array [0 .. ntl, 0 .. n-1] of synch integer: z; 
« iainod .. n-1 :: z[0, i] := x[i], 
ytij] := z[ntl, il, 
«tind ..ns:: z{tt+l, ij] := 


((i-t) mod 2 = 0 > min (z[t, i], z[t, it+1]), 
(i-t) mod 2 4 0 - max (z[t, i], z[t, i-1])) 
»» 4} 


5. Hamming’s Problem 


Given a tuple p of relatively prime numbers, and a positive integer 
n, produce an output z which is a list of multiples of numbers in p. 
List z has n elements and is in increasing order. 


The program uses the following local variables: Let mbe the size of 
tuple p. Variable xis an array [0 .. m- 1] of tuple, and d, y are tuples. 


Program Composition Notation 277 


Figure 1 - Hamming's Problem, Extended 


The elements of array x, and variables d, y are tuples that represent 


lists. 


The program is a parallel composition of (Figure 1): 


* A program, mult, that defines an element of list x[i] as 


the product of the corresponding element of list z and 
p{i]. There is one mult program for each i. 


e A program, merge, that merges the m lists in x to 


produce the list y. 


List d is defined as the number 1 followed by list y, i.e., 
d= {1,y}. 


¢ A program firstn that defines the output sequence z as 


the first n elements of list d. 


hamming (p,n, z) 


tl] 


array [0 .. m-1] of synch: x; 

« iano .. m1 :: mult(z,pl[i],x[i]) », 
merge (x,m,0,m,y), 

dad := {1,y} 


firstn(d,n,z) 


278 K. M. Chandy and S. Taylor 


mult: The first argument of mult is an input list z, and its second is an 
input integer, q; its output argument is a list v, obtained by multiplying 
each element of z by q. 


The value of v in program mult (z,q,v) is defined as follows: 


1. If z is the empty list, then v is the empty list. 


2. If the head and tail of z are hz and tz, respectively, 
then v = {w,tv}, where w = hz * q, and where tv is 
defined by mult (tz,q,tv). 


mult (z,q, v) 
integer: q; 
{? {} daz 73> v = {} 
,{hz,tz} dz > {ll vi := {w,tv}, w := hz * q, mult (tz,q,tv) } 
} 


merge: Program merge (Figure 2) has the following input parameters: 
the array, x, the integer m, and two indices lo and hi. Program merge 
merges the lists x[lo],...,x{hi-1]. It has a single output parameter 
q, the merged list. 


The program calls another program m2 which merges two lists. 
Program merge is implemented as a balanced binary tree of m2 pro- 
grams. 


The output of program merge is a function of its inputs. Output q is 
defined as follows: 


l. Ifhi - lo = 1, then the output list q is the same as the 
input list x [lo]. 


2. If hi - lo = 2, then the output list q is defined by 
m2(x[10],x([hi],q), where m2 merges its two input lists 
x{lo] and x[hi] to produce its output list q. 


3. Ifhi - lo > 2, then output list q is defined as follows: 


Program Composition Notation 279 


merge 


x[hi-1] 


x{mid] 


x{mid-1] 
x{lo] 


Figure 2 - The Merge Function 


Define mid as (lo + hi)/2. Note: lo< mid < hi. 
Define lopart as merge (x,m,1lo,mid, lopart). 
Define hipart as merge (x,m,mid,hi,hipart). 
Define g as m2 (lopart, hipart,q). 


Output q is a merge of lopart and hipart, where 
lopart and hipart are merges of the lists in x[i] for 
lo< i< mid, and mid < i< hi, respectively. 


merge (x,m,1o,hi,q) 

integer: m, lo, hi; 

array [0 .. m-1] of synch tuple: x; 

synch tuple: q ; 

{? (hi - lo) = 13 q := x[lo] 

1 (hi - lo) = 2 % m2(x[lo], x[lot+l], q) 

, (hi - lo) > 2 > {|| synch integer mid; 
, Mid := (lo + hi)/2 
, merge (x,m,1lo,mid, lopart) 
+ merge (x,m,mid, hi, hipart) 


, m2(lopart, hipart, q) 


280 K. M. Chandy and S. Taylor 


m2: Program m2 has two input tuples u and v, and a single output 
tuple w. Tuples u, v and w represent lists. Output w is defined as 
follows: 


1. If an input, u or v, is the empty list, then w is the empty 
list. 


2. If u= {hu,tu} and v= {hv,tv}, then there are 3 cases: 


Ifhu < hv, then w = {hu,tw}, where tw is defined by 
m2 (tu,v,tw). 

If hv < hu, then w = {hv,tw}, where tw is defined by 
m2 (u,tv,tw). 

If hu=hv, then w = {hu,tw} where tw is defined by 
m2 (tu, tv,tw). 


m2 (u,v,w) 

{? 

{i} du) Ow c= {} 
({} dv) aw c= {} 


~ 


({hu,tu} <u) , ({hv,tv} dv) > 


. 


{? 


hu < hv > {II w 


{hu,tw}, m2(tu,v,tw) } 
hu > hv 7 {/l w 


“ 
Ul 


{hv,tw}, m2(u,tv,tw) } 
hu = hv ~ {l/l w 


ae 


{hu,tw}, m2(tu,tv,tw) } 


firstn: Program firstn has inputs, list d and integer n, and it has 
output list z. The output is the first n elements of d. The output is 
defined as follows: 


1. Ifn = 0 ord is empty, then z is empty. 


2. Ifn > Oandd = {hd,td}, then z = {hd,tz} where tz 
is defined by firstn(td,n-1,tz). 


Program Composition Notation 281 


firstn(d,n, z) 
integer: n; 
{? (n = 0) > z:= {} 
1, (A= {}) az i= {} 
, ({hd,td} 1d), (n> 0) > 


{|| z := {hd,tz}, firstn(td,m,tz), m := n-1} 


6 Paraffins 


The interconnection structure of carbon molecules of a paraffin 
form a free tree in which a vertex has at most four neighbors. A free 
tree has one centroid or two centroids that are neighbors. Therefore, 
a free tree in which each vertex has at most four neighbors can be 
represented by (Figure 3): 


1. A single centroid connected to at most four rooted 
trees, where each vertex in each rooted tree has at 
most 3 sons, or 


2. Two centroids, each of which is a rooted tree, where 
each vertex in each rooted tree has at most 3 sons. 


Therefore, the core of the problem is to represent all rooted trees, in 
which each vertex has at most 3 sons. 


6.1. Rooted Trees 


In this section we use the term ‘rooted tree’ to mean a rooted tree 
in which a vertex has at most 3 sons. Rather than merely print all free 
trees we shall employ a simple data structure to store all free trees of 
a given size. In other words, we shall store all representation of paraf- 
fins in a data structure, and we can carry out computations on paraf- 
fins. 


282 K. M. Chandy and S. Taylor 


centroid 


root root 
root 


centroid vw © centroid 
iW © Vv (8)}—(C) 


o)—-® 
©) @) &) © ©) ) ®& 


Free tree with Rooted tree Free tree with Represented 
single centroid representation double centroids by two rooted 
trees 
a) b) 


Figure 3 - Two free trees and their rooted tree representation 


Next, we describe the data structure employed to store all rooted 
trees with at most n vertices (Figure 4). Let tbe an array of tuples. 
Tuple t[m] is the sequence of rooted trees with m vertices, where 
m>l. 


A rooted tree has either 0 or 1 vertices, or has a root which has 3 
children, (each child is a rooted tree). A rooted tree with more than 1 
vertex is represented uniquely by a tuple {i, u, j, v, k, w}, where i j, 
and k are the number of vertices in the 3 children of the root, and u, 
v, and w are the indices to the children in sequences ¢j, tj], t{k], re- 
spectively. Thus the first child is ¢t[i][u], and the second is ¢[j][v], and 
the third is t{k][{w]. The values i, j, k satisfy m- 1=i+j+k, where m 
is the number of vertices in the tree. Uniqueness is maintained by en- 
suring a lexicographic ordering between (i, u), (, v), and (k, w). Let 
s[m] be the number of rooted trees with m vertices. 


Program to Generate Rooted Trees: The program has input parameter 
n—all rooted trees with at most n vertices are to be generated. The 
program has output parameter t, the array of sequence of trees. 


Program Composition Notation 283 


t[2]  ]{1,0,0,0, 0,0} 


first child 
second child 


> child 
ahs 
no ge iis 


[3] ]{2,0 , 0,0 , 0,0} {1,0, 1,0 , 0,0} 
t[4} 13,0, 0,0 , 0,0} {3,1 , 0,0 , 0,0} {2,0 , 1,0, 0,0} mot » 1,0, 1,0} 


Figure 4 - A data structure for rooted trees 


Consider the three children of a tree with m + 1 vertices, where 
m>0O. Consider a tree represented by the tuple {i, u, j, v, k, Uj. 


First Child: The first child must have at least m/3 vertices because 
the number of vertices in the first child is at least the number of ver- 
tices in the other two children. The first child can have at most m 
vertices. Therefore ican take on values in the range[m/3]to m. The 
index, u, of the first child is in the range 0 to s[i] - 1. 


Second Child: The second child must have at least (m - )/2 ver- 
tices because it has at least as many vertices as the third child, and the 
second and third child together have m - 1 vertices. The second child 
can have at most m - 1 vertices. Therefore j can take on values in the 
range [(m- )/3]1tom- 1. If i>Jj, the index, v, of the second child is 
in the range 0 to s{[jf]- 1. If i=j, to ensure that {i, u) is lexicogra- 
phically greater than or equal to (J, v), we require that u2v. We can 
do this in several ways—we have chosen to ensure lexicographic order- 
ing by means of a guard (i> J) v (us vu). 


Third Child: The third child has precisely m - i-j vertices. If 
j>k, the index, w, of the third child is in the range 0 to s[k] - 1. If 


284 K. M. Chandy and S. Taylor 


j =k, to ensure that (j, v) is lexicographically greater than or equal to 
(k, w), we require that v > w. 


The program assigns a tuple to t{m + 1] in sequence for min 1 .. n- 
1. The tuple assigned to tim + 1] has a sequence of elements, each of 
which is a tuple {i, u, j, v, k, w}. The tuple assigned to t{m + 1] is 
defined by quantification over i, j, k, u, v, w, given in the paragraphs 
describing the three children. 


rt (n,t) 


integer: n; 


array [1 .. n] of tuple: t; 
{3 
s[0] := 1, s(1) :=1 
, © mainil .. n-1 :: t[mti] := {« i in (m/3) .. m, 
j am (m-i)/2 .. min(i,m-i), 
k in (m-i-j) .. (m-i-}), 
uin oO .. s[i]J-1, v in 0 .. s[j}]-1, win 0 .. s{k]-1:: 
((i > 3) Vv (u S v)) A ((3 > k) Vv (vw Sw )) > {3,u0,5,V,k,w} 
»} 
, s[mj) := sizeof (t[m]) 


»} 


Next we give the program for printing a rooted tree. A simple pro- 
gram is to print ¢{i] for all i however, representing a tree by a six- 
tuple may not give adequate information about the structure of the 
tree; so we also propose another algorithm. The program p has input 
parameters t, the indices i, u int of the tree to be printed, and for 
convenience, the level number, 1, of the tree and the son number, s, 
where sons are numbered 0, 1, 2. 


p(t,i,u,1,s) 
array [1 ..n] of tuple: t; 


integer: i, u, 1, s; 


Program Composition Notation 285 


{; print (“level*,1,*son*,s,* number of vertices*,i) 
, (i> 1) > {3 k an [0..2] 


p(t,t (i) [u) [2*k],t [i] [fu] [1+2*k],1+1,k)} 
} 


Next we give the program to print all free trees, of size m + 1 with 
a single centroid. It is similar to the program that constructs all 
rooted trees, because it generates all rooted trees in which the root 
has at most 4 sons, and all other vertices have at most 3 sons. 


single (m) 


integer: m; 


{; « i in [(m/4) .. mm], 

j in [(m-i)/3 .. min(i,m-i)], 

k in [(m-i-j)/2 .. min(j,m-i-}3)], 

lL in [(m-i-j-k) .. (m-i-j-k)], 
uin 0... s{ijJ-1, v in 0 .. 3s[3]-1, 
win oO .. s(k]-1, x in 0 .. sf{l]-1:: 


((i > 3) Vv (u>v)) A ((5 > k) Vv 
(v >w)) A ((k > 1) Vv (w>x)) 7 


{7 p(t,i,u,1,9), p(t,j,v,1,1), p(t,k,w,1,2), p(t,1,x,1,3) } »} 


Finally we give the program for printing all free trees with double 
centroids. 


double (n) 

integer: n; 

nmod2=0 5 {; « u in [0 .. s{n/2]-1], 
vin [u .. s[n/2] - 1] 


7 p(t,n/2,u,1,0), p(t,n/2,v,1,0) } 
» } 


The program for printing all free trees of size up to nis simply the se- 
quential composition of single (m) and double (m) for all min [1 .. nl. 


286 K. M. Chandy and S. Taylor 


Figure 5 - The Doctor's Office Problem 


7. Doctors’ Office 


Let patients be indexed iwhere 0 < i< np. Let sick[i] be the road 
that the i-th patient takes to go to the hospital (Figure 5). All sick 
patients enter through a door, and thus form a single queue for the 
receptionist. The list of sick patients waiting for the receptionist is 
called admit. The entry door is modeled by the program merge with 
input parameter, the array sick, and output parameter, the list admit. 
Program merge merges its input lists (i.e., elements of its input array) 
to produce the output list. A fair merge is used because every patient 
who gets to the entry door passes through it eventually. 


Let doctors be indexed j where 0 <j < nd. The list of available doc- 
tors waiting at the receptionist’s desk is called d; each entry in the list 
is the id of a doctor. 


The receptionist takes elements of admit and d and produces a list 
of {sick-patient, doctor} pairs. This list is called spd. 


The list of {sick-patient, doctor} pairs goes through a passage and 
each patient-doctor pair then enters the appropriate doctor’s office. 
Let the list of {patient, doctor} pairs entering the office of the j-th 
doctor be office[j]. The passage is modeled by a program fork with 
input parameter, the list spd, and an output parameter, the array 
office. Each entry in the input list of program fork contains an index 


Program Composition Notation 287 


into its output array; an entry in the input list is copied into the 
specified output list. 


The list of patient-doctor pairs leaving office j is called happy [3]. 
A patient is cured in arbitrary (finite) time. The process of curing by 
doctor j is modeled by a program ran that has a single input list, 
office[j], and a single output list happy[j]. Elements from the input 
list of program ran are placed in its output list after an arbitrary finite 
number of steps. 


Patient-doctor pairs leaving each of the doctor’s offices pass 
through a door and join a single output list, cured, of patient-doctor 
pairs. The door is modeled by a program merge with input, happy, and 
output, cured. 


The receptionist processes the elements of cured. The doctor 
identified by the doctor field joins a list fd of doctors who have fin- 
ished seeing the receptionist. The list d of available doctors is the 
initial queue of available doctors followed by fd. The patient identified 
by the patient-field in cured joins the list, out of patients going home. 
A patient at the head of list out goes through the exit and then goes to 
to his or her home; let the road home for patient i be healthy [il]. 
The exit is modeled by a program fork which has input, out, and 
output, healthy. 


A healthy person becomes sick in an arbitrary finite time. Thus an 
entry in list healthy[i] becomes an entry in list sick[i] in arbitrary 
finite time. We model patient i falling ill by program ran which has 
input healthy [i] and output sick [i]. 


Initial Conditions: We assume that initially all patients are healthy and 
all doctors are available. Therefore, the input to ran is the initial value 
i (because patient i is healthy initially) followed by sequence 
healthy{[i} of returns from the hospital after being cured. The 
sequence of available doctors d is the sequence of doctors in 


288 K,. M. Chandy and S. Taylor 


increasing order of id (i.e., the sequence j for j from 0 up to nd-1) 
followed by fd, the sequence of doctors who finish seeing the 
receptionist. We define d by a program makelist that has inputs nd 
and fd and output d. 


hospital(admit, spd, out, np, nd) 
integer: np, nd; 
{11 


array [0 .. np-1] of synch: healthy, sick; 
array (0 .. nd-1] of synch: office, happy; 
« iain [0 .. np-1] :: ran({i, healthy[i]}, sick[i]) » 


, merge (sick,np,0,np,admit) 

, veceptionist (admit,d, cured, spd, out, fd) 

, fork (spd,nd, office) 

, © J in [0 .. nd-1) :: ran(office[j],happy[i]) » 
+ merge (happy, nd, 0,nd, cured) 

, fork (out,np, healthy) 

, makelist (nd, fd, d) 


7.1. ran 


Program ran has a single input sequence x and a single output se- 
quence y. An element in the input sequence is placed in the output 
after an arbitrary (finite) number of steps. In the following program, if 
the first guard is chosen, an element from x is placed in y. If the se- 
cond guard is chosen, y remains unchanged. The first guard will be 
chosen eventually because the program is constructed using fair com- 
position. (If we want to allow the possibility of healthy patients re- 
maining healthy forever, choice composition would be used in place of 
fair composition.) If the input is the empty list then the output is the 
empty list. 


7.2. 


Program Composition Notation 289 


ran (x,y) 
{~ ({hx, tx} x) > {1| ran(tx,ty), y := {hx, ty}} 
, ({hx, tx} x) > ran(x,y) 


1 (UF dx) Sy := {} 


merge 


We use a tree of merge programs, as in Hamming’s problem, except 
that the program for merging two sequences is different. Here we 
give only the program, m2, for merging two sequences, because the 
remainder of the program is identical to Hamming’s problem. 


Program m2 has two input lists u and v, and a single output list w. 
Program m2 defines a relation between u, v, and w, by giving values of w 
that satisfy the relation for given values of u and v. List w is defined in 
terms of u and v as follows: 


1. If both u and v are empty lists, then w is the empty list. 


2. If list u is not empty then let hu and tu be the head and 
tail (respectively) of u; list w is defined as {hu,tw}, 
where tw is defined by the relation m2 (tu, v,tw), or 


3. if list v is not empty then let hv and tv be the head and 
tail of v (respectively); list w is defined as {hv,tw}, 
where tw is defined by the relation m2 (u, tv, tw). 


m2 (u,v, Ww) 
{~ 
(} Au), (1) Av) > w= (} 
, ({hu,tu} Au) > {J m2(tu,v,tw), w r= {hu,tw}} 
+ ({hyv,tv} dv) > {11 m2(u,tv,tw), w := {hv,tw}}} 


290 K. M. Chandy and S. Taylor 


7.3. fork 


Program fork has two inputs: the tuple x and index m. Its output is 
an array y of tuples; each of these tuples represents a list. Array y is 
indexed iwhere 0 < i<m. The input x represents a list of 2-tuples 
{j,v}, where 0 <j <m. 


Program fork defines output y as a function of its inputs. Output y 
is defined as follows: 


1. If x is the empty list, then for all 5: y[4] is the empty list. 


2. If the head element of x is {hx,tx}, lethx = {4,v}; for 
all i other than j, y[i] is defined as z[i] and y[j] is de- 
fined as {w,z[4]}, where z is fork (tx,m,z). 


fork (x,m,y) 
integer: m; 
array [0 .. m-1] of synch: y; 
{? 
({} @ x) {ll « i an (0 .. np-1] :: y[i] := {} »} 
, ({hx,tx} Sx), ({j,v} hx) > 


{|| array [0 .. m-1] of synch tuple: z; 
fork (tx,m,z), 
« ian (0 .. m-lj:: (i # 4) > ylil := zfil, 
(i = 3) > yi] := {hx,z[il} 


»}} 


7.4. Receptionist 


Program receptionist defines a relation between its arguments by 
defining values of its outputs, spd, out, and fd, for given values of its 
inputs, admit, d, and cured, as follows: 


1. If any of the inputs is the empty list then all of the 
outputs are empty lists. 


Program Composition Notation 291 


2. If admit = {i,ta} andd = {j4,td}, then spd = 
{{i,3},tspd}, where tspd is defined by receptionist 
(ta,td,cured,tsp,d,out, fd). 


3. If the head element of cured is {i,j}, then 
out = {i,tout} and fd = {j,tfd}, where tout and tfd 
are defined by receptionist (admit,d,tc,spd,tout, 
tfd,tfd), where tc is the tail of c. 


receptionist (admit,d, cured, spd, out, fd) 
{? (sizeof(admit) = 0) v (sizeof(d) = 0) v (sizeof(cured) = 0) 73 
{{| spd := {}, out := {}, fd s= {}} 
+ ({i,ta} 4 admit), ({j,td} dd) > 
{|| receptionist (ta,td, cured, tspd, out, fd) 
spd := {{i,j},tspd}} 
, ({{i,j},te} << cured) ~ 
{|| receptionist (admit,d,tc,spd,tout,tfd) 
, out := {i,tout} 
, fd := {j,tfd} 
}} 


7.5. makelist 


Program makelist has two input parameters, nd and fd, and a 
single output parameter d. Parameters fd and d are lists, and nd is an 
integer. The program defines its output as a function of its inputs as 
follows: 


1. Ifnd = 0 thend = {0,fd}. 


2. Ifnd > 0 then dis defined by makelist (nd-1,d, {nd, fd}) 


makelist (nd,d, fd) 
integer: nd; 
{? nd=0 4d := {0,fd} 


7 nd#075 


292 K. M. Chandy and S. Taylor 


{|| makelist(m,d,t), m := nd - 1, t := {nd,fd}} 


8 Skyline Matrix 


Data Structures: We shall first give the algorithm described in [2] and 
later modify it (as in [2]) to avoid storing zeroes. The problem is to 
obtain the LU-decomposition of matrix a. Let L, U, and abe arrays [1 .. 
n, 1.. n] of reals. Let d be an array [1 .. n] of reals. The lower 
triangular array of the decomposition is stored in L, and the upper 
triangular array is stored in U, and the diagonal elements (of 1) are 
stored in d. 


Let r, c be arrays [1 .. n] of integers, where 


1. rfi]) < i, and for all j where j < r[i]: ali, j] =0, and 
2. cljl <j, and for all i where i < c[j], ali, j] = 0. 


Thus rand c define the skyline of nonzero entries in the array a. 


The original description of the algorithm is given below. (See 
algorithm 2.1, page 5, [2].) Initially L contains the lower half of a, and 
U contains the upper half of a, and d contains the diagonal elements of 
a 


Program Specification: 
For i := 1 to n do 
[For j := r[i] to i - 1 do 


+ 4 + 8 j-1 , 4 ‘ 
Blip jie= Bid,3] = Ya ee pida ergy (hited) * Ute s1)s 


For j := c[i] to i - 1 do 
Cte Wha) ee eretigy hl Ok aT Sttle 


alil:= Atal - Serecetip eti1) (ble k] * Ulk, 11) 


Program Composition Notation 293 


Memory Optimization: 


Let [be an array [1 .. n] of tuples, where I[i] is the tuple of elements 
a(i, j], where r{i] <j <i Thus all nonzeroes in row iof ato the left of 
the diagonal are in tuple [[jJ. Thus ali, j] = Ui[j- rlil) if rlj </. 


Similarly, let ube an array [1 .. n] of tuple, where u[j] is the tuple of 
elements ali, j], where c[j] < i<j. Thus all nonzeroes in column j of a 
above the diagonal are in tuple uf[j]. Thus ali, jf] = ufj}[i-cU]] if ci] <i 


The program in PCN is obtained from the specification by the 
following syntactic transformation. Replace 


° high exp by (+ over « k in [low..high] :: exp »), 
© Lli,j] by 1{illj - rfill, 
© ulj,i]) by ulill[j - cli}l. 


The resulting program is: 


lu(1,u,d,r,c) 


array [1 .. n] of integer: d,r,c; 
array [1 .. n] of tuple: 1,u; 
{7 

« ian [1 .. nj 

« jin [rfi] .. i-1] 


1{a]{j - efil]:= (1fil€3 - rfill - 
(+ over « k in [(max(r{i],c[j])..j-1:: 
L{i](k - rfi]] * uljl{k - cfj]] »)) 
» 
« jin [c{i] .. i-2)} 
uli] {3 - cil] := (ufilfj - cfill] - 
(+ over « k in [(max(c[il,r{j])..j-l]:: 
1(j][k - r[3]] * ufil(k - c[i]] »)) / 4f5] 
» 


294 K. M. Chandy and S. Taylor 


d{iJ := (d[ij] - (+ over « k in [(max(r[i}),c[i])..i-1] 
L[i]{k-r[i}] * ulij[k-c[i]] »)) 
»} 


Back solve is obtained from its specification by the same syntactic 
transformation. 


References 


1. Chandy, K. M. and J. Misra. Parallel Program Design. Addison 
Wesley, Reading, MA, 1988 


2. Ejisenstat, S. C. and A. H. Sherman. Subroutines for envelope 
solution of sparse linear systems. Research Report 35, Yale 
University, New Haven, CT, October 1974 


3. Foster, I. and S. Taylor. Strand: New Concepts in Parallel Program- 
ming. Prentice Hall, Englewood Cliffs, NJ, 1990 


4. Hoare, C. A. R. Communicating Sequential Processes. Prentice Hall 
International, London, England, 1984 


Appendix 


Control flow from one block to another is defined in the appendix, 
in the context of program composition. 


Associate a unique name with each fair composition block. In an 
execution of a program, a block may be executed an arbitrary number 
of times. Consider a given execution of a program, and give a unique 
name to each block execution within the program execution. Control 
flows from an execution of block b to an execution of block d if: 


1. bis a parallel composition block and d is one of the 
blocks in the block-list of b, or 


2. bis a sequential composition block and d is the first 
block in the block-list of d, or 


Program Composition Notation 295 


3. a sequential composition block is executed in which b 
precedes d in the block-list, or 


4. bis a choice-composition block or a fair-composition 
block, and g —> dis in the block-list of b, and in the 


execution of b, g evaluates to success and then dis 
executed, or 


5. there exists a block c such that control flows from b to 
c, and from cto d. 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 297 


The Scheme Programming Language 


John Franco 
Department of Computer Science 
University of Cincinnati 
Mail Location 8 
Cincinnati, OH 45221 


Daniel P. Friedman 
Department of Computer Science 
Indiana University 
Bloomington, IN 47405 
Olivier Danvy 
Department of Computing and Information Sciences 


Kansas State University 
Manhattan, KS 66506 


L. History and Features of Scheme 


Scheme was developed in 1975 by Gerald J. Sussman and Guy L. 
Steele Jr. [4]. Their goal was to build a Lisp-like prototype of Carl 
Hewitt’s Actors system. To their surprise, after removing several lay- 
ers of syntactic and semantic sugar, what remained was an extension 
of an applicative order A-calculus supporting state and first class con- 
tinuations. Since then there have been four revisions of the language 
definition and an effort is underway to create an IEEE standard for 
Scheme. Scheme today still stands as an extended A-calculus with 
declarative and imperative components. It is in use at numerous uni- 
versities and research centers as the first programming language for 
students or at the core of research projects or software products. A 
complete description of Scheme can be found in [1,3]. 


Due to its Lisp origin, the basic syntactic units of Scheme are fully 
parenthesized expressions. The basic control structure of Scheme is 


298 J. Franco, et. al 


procedure application. Procedures are either represented as A-ex- 
pressions or are primitive, operating on numbers, strings, symbols, 
cons pairs, lists (built out of cons pairs), vectors, and global i/o 
streams. 


Besides being applied to data types other than functions, Scheme 
extends A-calculus in three directions: 


¢ with conditional expressions; 


* with sequencing and first-class access to control via 
continuations; 


e with side-effects on variables, structures, and the out- 
side world. 


Scheme is call-by-value, i.e., procedures are applied to the values of 
their arguments, in contrast to ALGOL 60 and lazy languages. A con- 
venient way to delay the evaluation of an expression is to define it as a 
parameterless procedure. Applying this procedure to no arguments 
entails the evaluation of the delayed expression. This makes it simple 
to define streams, as illustrated by the solution to the extended 
Hamming problem. 


Scheme is block-structured and lexically scoped. All the solutions 
in this chapter make use of this property. 


Procedures are first class objects. They are either primitive or re- 
sult from evaluating a \-expression, and can be passed as arguments, 


returned as results, and stored in data structures. 


Because Scheme is lexically scoped, procedures can be viewed as 
instances of the same \-expression, each with a local environment and 
state. This and the fact that procedures are first-class objects make it 
easy to program in an object-oriented style, as illustrated in the solu- 
tion to the doctors-patients problem. 


Scheme 299 


The implementations of Scheme are properly tail-recursive. Thus, 
recursion is automatically turned into iteration in many cases. 
Consequently, the language does not provide explicit loop constructs. 
This property is exploited in the solution to the doctors-patients 
problem where communication between patients, doctors, and recep- 
tionists proceeds in round-robin fashion. 


Because they are fully parenthesized, Scheme programs can be rep- 
resented as lists. This makes it simple to treat programs as data ob- 
jects. In particular, syntactic extensions (macros) are expressible 
within the Scheme world without resorting to an outside abstract syn- 
tax. 


Scheme expressions are dynamically typed, i.e., type-checking is 
performed at runtime. This makes it possible to express programs 
that have no type. 


Scheme does not explicitly support any form of parallelism today. 
However, by requiring that the order of evaluation of subexpressions 
be arbitrary in applications, the Scheme designers are leaving open 
some form of collateral evaluation. 


Finally, Scheme inherits its representation of lists with cons pairs, 
its implicit memory management relying on garbage collection, and its 
quotation from Lisp. 


2. The Syntax of Scheme 


For the purposes of this chapter, we use the following BNF for 
Scheme. 


<declaration> ::= 
<expression> 


| (define <variable> <expression>) 


300 


Squared brackets stand for regular parentheses—experience shows 
that they improve the readability of programs. The superscript * de- 
notes zero or more occurrences of the preceding form; the super- 
script + denotes one or more occurrences of the preceding form; and 


J. Franco, et. al 


| (extend-syntax (<identifier>) [(<identifier> <input-spec>) 


(<expression>) }*) 


<expression> ::= 
<constant> 
| <variable> 
<application> 
| (lambda (<variable>*) <expression>) 
(lambda <variable> <expression>) 
| (if <test> <consequent> <alternative>) 
(case <tag> [(<symbol>+) <expression>]* [else <expression>]) 


(cond [<test> <expression>]+ [else <expression>]) 


(and <expression>*) 

| (or <expression>*) 

| (let ([<variable> <value>]*) <expression>) 

| (letrec ([<variable> <value>]*) <expression>) 
| (let* ({<variable> <value>]* <expression> 

| (begin <expression>*) 

| (set! <variable> <expression>) 


| (apply <expression> <expression>) 


| '<S-expression> 


<application> ::= (<expression> <expression>*) 
<input-spec> ::= <S-expression>* 
<consequent>, <alternative>, <value>, <tag> ::= <expression> 


<condition> denotes a Boolean expression. 


There are three kinds of declarations: the definition of a global vari- 


able; the definition of a syntactic extension; and plain expressions. 


Scheme 301 


2.1. Global variables 


Global variables are defined in the global environment, and bound to 
the value of the corresponding expression. All procedures defined 
globally are mutually recursive. In particular, there is no need to de- 
clare recursive procedures explicitly where they are global. This con- 
trasts to declaring local procedures (with letrec). 


2.2. Syntactic extensions 


Syntactic extensions offer a macro facility based on specifying ex- 
pansions conditionally, depending on syntactic contexts selected by 
pattern matching. While not unanimously agreed upon in the Scheme 
community, they provide a very convenient expressive power and are 
sufficiently well-understood for serving the purposes of this chapter. A 
complete description of extend-syntax can be found in [1,2]. 


For example, let us now define the following syntactic extensions 
rec, when, and unless that will be used in the rest of this chapter. 


(extend-syntax (rec) 
{({rec ide exp) 
(letrec ([ide exp]) 
ide)]) 
(extend-syntax (when) 
[(when test exp) 
(if test 
exp 
‘undef ined) ]) 
(extend-syntax (unless) 
[(unless test exp) 
(if test 
‘undefined 


exp) ]) 


302 J. Franco, et. al 


rec is a recursive declaration and evaluates to the recursively declared 
procedure. when and unless implement a “one-arm” conditional ex- 
pression. 


2.3. Expressions 


An expression can be an expression from the d-calculus or be a 
declarative or imperative extension to the A-calculus. 


2.3.1. Pure Scheme expressions 


Constants, variables, A4-expressions, and applications give to 
Scheme the full power of an applied, weakly typed, and applicative 
order A-calculus. Constants (strings, numbers, etc.) evaluate to ground 
values. Variables are denoted by values in the current environment. A- 
expressions evaluate to first-class procedures where free variables are 
bound lexically. Expressions occurring in an application are evaluated 
in any order. The first one evaluates to a procedure. This procedure 
is applied to the values of the remaining expressions. 


2.3.2 Conditional and declarative extensions 


Scheme provides conditional and declarative extensions to the A- 
calculus. These include conditional expressions (if, case, cond), 
logic operations (and, or), and local definitions (let, let rec). 


if evaluates its test-expression. If the result is not the Boolean 
value false, then the consequent is evaluated, otherwise the alternative 
is evaluated. case evaluates the tag expression and returns the value 
of the first expression one of whose associated symbols matches the 
tag. The last symbol else matches any value. cond has the same se- 
mantics as in Lisp. and returns false if one of its expressions evaluates 
to false. Sub-expressions in and (resp. or) are evaluated from left to 
right until a false (resp. non-false) value is found or until all the ex- 


Scheme 303 


pressions are evaluated. and (resp. or) expressions evaluate either to 
false if an expression has evaluated to false (resp. if all expressions 
have evaluated to false), or to the last (resp. the first) non-false value. 
let declares parallel local bindings, and letrec declares mutually re- 
cursive local bindings. 


2.3.3. Imperative extensions 


Scheme provides imperative extensions to the A-calculus. These 
include let*, begin, and set!. let* declares local bindings se- 
quentially. begin sub-expressions are evaluated sequentially and the 
value of the last expression is returned. set! assigns the value of an 
expression to a variable provided that this variable is declared in the 
lexical environment. 


2.3.4, Application 


(apply <expression> <expression>) 


returns the result of applying the value of the first expression to the 
value of the second, where the first value is a first-class procedure and 
the second is a list of augments for this procedure. 


2.3.5. Primitive operators 


In addition to these special forms, Scheme provides primitive pro- 
cedures for processing ground constructs. Their name is postfixed 
with a question mark if they are predicates, and with an exclamation 
mark if they perform a side-effect—this convention is usually followed 
by Scheme programmers in their own programs. 


For example, zero? checks whether its argument is the number 0, 
and null? checks whether its argument is an empty list. eqv? 


304 J. Franco, et. al 


checks whether its arguments are the same value. eq? checks 
whether its arguments are the same object. 


cons, car, and cdr are the usual list building and decomposition 
operations. set-car! and set-cdr! update the two fields of a cons 
cell. 


In addition, Scheme provides structural predicates. For example, 
pair? tests whether its argument is a cons cell, number? tests 
whether its argument is a number, and so on. 


Finally, the procedure call-with-current-continuation is 
passed a unary procedure, and applies it to a procedural abstraction of 
the current continuation. Applying this continuation later to a value 
will restore the current continuation and pass it this value. 


2.3.6. Representation of values 


The Boolean constants true and false are represented by #t and #f, 
respectively. The empty list is denoted by '(). Contrary to its ances- 
tor Lisp, the representations of false and of the empty list are not re- 
quired to coincide in Scheme. 


2.3.7. Quotation 


Scheme inherits quotation from Lisp. This meta-structural facility 
allows one to specify values such as numbers, strings, symbols, 
vectors—or any list or vector of these—within the text of programs. 
This of course strongly relies on the direct representation of programs 
as data. As in Lisp, the quotation '<S-expression> abbreviates (quote 


<S-expression >). 


Scheme 305 


2.4. Conclusion 


This section has presented the syntax and some of the semantics 
and rationale of Scheme. A formal description of its semantics can be 
found in [3]. 


3 Hamming’s Problem Extended 


Let a, b, c,...be an ordered list of prime numbers. Hamming’s 
problem is to produce the ordered stream containing integers match- 
ing the product 


axbxck... 
where i, j, k, ... are non-negative integers and at least one of i,j, k,... 
is non-zero. 
3.1. Analysis of the modified problem 


Let Tbe any integer and suppose Hamming’s problem is modified 
to return the ordered stream given by 


Txadxbxck... 


that is, the solution to the original Hamming’s problem except that all 
output tokens are multiplied by the integer T. This stream is equiva- 
lent to the stream beginning with the token Tx a and followed by the 
ordered stream of integers equal either to 


Txaxadxbixck... 
or to 


Tx bl xck... 


306 J. Franco, et. al 


where i, j, k,...are non-negative and at least one of i, j, k,...is non- 
zero, and j’, k’,...are non-negative, and at least one of j’, k’,...is 
non-zero. 


This observation is the basis of our recursive solution to Hamming’s 
extended problem. We construct the output stream by appending the 
result of merging the streams 


Txaxadxbdxck.. 
and 
Tx bl xck... 


to the output token T x a. 


3.2. Scheme properties exploited by our solution 


The implementation of the solution requires the manipulation of 
streams. In particular, streams must be merged and assembled. 
Although Scheme does not have the native ability to operate on or to 
create streams, we can derive it by delaying the construction of 
streams tails. 


Let us first define the standard operations on streams viewed as 
lazy lists. 
(extend-syntax (force) 
[(force e) (e)]) 


(extend-syntax (delay) 
{(delay e) (lambda () e)]) 


(extend-syntax (empty-stream) 


[(empty-stream) '()]) 


(extend-syntax (null-stream?) 


{(null-stream? s) (null? s)]): 


Scheme 307 


(extend-syntax (car-stream) 


[(car-stream s) (car s)]) 


{extend-syntax (cdr-stream) 
((cdr-stream s) (let ([d s]) 
{if (procedure? (edr d)) 
(begin 
(set-cdr! d (force (edr d))) (edr d)) 
{edr d)))]) 


extend-syntax (cons-stream) 


[(cons-stream ad) (cons a (delay d))j) 
Then, a procedure merging two streams of increasing integers is 


(define merge-streams;;; Stream(Int) x Stream(Int) -> Stream(Int) 
(lambda (s1 s2) 
(cond 
[(null-stream? $1) 
82] 
{(null-stream? s2) 
81] 
[else 
(let ([al (car-stream sl1)] [a2 (car-stream s2)]) 
(if (<= al a2) 
(cons-stream al (merge-streams (cdr-stream sl) s2)) 


(cons-stream a2 (merge-streams sl (cdr-stream s2)))))]))) 


This solution makes liberal use of functions as values and of delaying 
the evaluation of values by expressing them as parameterless proce- 
dures. The resulting program is kept simple by using syntactic ex- 
tensions. 


308 J. Franco, et. al 


(define hamming tee Int x List (Int) -> Stream(Int) 
(lambda (T lp) 
(if (null? 1p) 
(empty-stream) 
(let ([new-T (* T (car lp))]) 
(cons-stream new-T (merge-streams 
(hamming new-T lp) 


(hamming T (edr lp)))))))) 


Figure 1 - Procedure solving Hamming’s extended problem 


3.3. Our solution 


The solution to this problem involves merging pairs of arbitrarily 
long lists. These lists are represented as streams. 


Procedure hamming displayed in Figure 1, is passed an output token 
T, and an ordered list of primes 1p in increasing order. hamming pro- 
duces a stream of integers in increasing order, equal to 


Txadxblxck... 


where i, j, k,... are non-negative integers, at least one of i, jf, k,...is 
non-zero, and a, b, c,... are elements of the prime list lp. 


Figure 2 displays a set of procedures to test this solution. The im- 
portant procedure here is print-stream! which takes as input an in- 
teger m specifying the largest stream element to be output and an in- 
teger n specifying the maximum number of stream elements to be 
printed. 


3.4. What have we learned in this section? 


The solution to Hamming’s problem can be expressed recursively 
using streams. Although recursion is natural in Scheme, an explicit 


Scheme 309 


(define test 77% Unit -> Unspecified 
(lambda () 
(print-stream! (hamming 1 '(2 5 11 53)) 300 60))) 
(define print-stream! ;;; Stream(Val) x Int x Int -> Unspecified 
(lambda (s m n) 
(if (or (null-stream? s) (zero? n)) 
(newline) 
(let ([tmp (car-stream s)] 
(begin 
(my-write! tmp) 


(when (< tmp m) 


(print-stream! (cdr-stream s) m (subl n)))))))) 
(define my-write! i77 Val -> Unspecified 
(Lambda (v) 
(begin (write v) (display " ")))) 


Figure 2 - A sample test of the procedures solving 
Hamming’s extended problem 


representation of streams was needed. It is simple to make these 
streams fully lazy (to avoid multiple representations of the same 
stream), but experience shows that the resulting program does not 
improve much in this example. 


4, The Paraffin Problem 


The problem is to find all isomers of the chemical composition 
CnHon+2. The problem reduces to finding all trees of n vertices and 


maximum degree 4. Each vertex or node in such a tree represents a 
carbon atom (there is no need to represent hydrogen atoms). 


310 J. Franco, et. al 


(define para 
(lambda (n) 
(let (([A  (create-array n)] 
{[B (make-vector (1 + n) '())] 
[t2 (ceiling (/ n 2))]}) 
(begin 
{map (lambda (x) 
(let ([tem (trees B x)]}) 
(begin 
(split-t A tem x n) 
(vector~set! B x tem)))) 
(incr-ord-list n)) 
(construct-trees An ‘even (- n 1) (height-proc t2 2)) 
(construct-trees An 'odd (- n 2) (height-proc t2 2)) 
(construct-trees An 'odd (- n 2) (height-proc t2 3)) 
(construct-trees An ‘odd (- n 2) (height-proc t2 4)) 
"OQ)))) 


Figure 3 - The main procedure 


4.1. The input and output specification 


The program para shown in Figure 3 takes as input an integer n. 
The output is a list of trees of maximum degree four and no two trees 
in the list are isomorphic. The iterating procedure map is defined in 
the appendix. 


4.2. Representation of trees 


Each tree in the output list is actually a collection of two or more 
subtrees. Each subtree is enclosed between angle brackets, <>, for 
readability. The number of nodes of all subtrees is n. The difference 
in height between the two highest subtrees is always zero. In the case 
that the longest path between terminal nodes is even, so that the 
number of nodes on that path is odd, the tree is split at the center 


Scheme 311 


node into up to four subtrees enclosed in brackets. The symbols <() > 
in the left margin indicate such a split and represents the singleton 
subtree consisting only of the center node. When the longest path is 
odd, so that the number of nodes on the path is even, there is a center 
edge and the split is into two subtrees connected by that edge. 


The subtrees themselves are defined recursively as follows. A sub- 
tree of one node is represented as (). A subtree with root having one 
child is 


(*subtree of the child*) 
A subtree with root having two children is represented as 
(*subtree of the left child* *subtree of the right child*) 
A subtree with root having three children is represented as 
(*subtree left child* *subtree middle child* *subtree right child*) 


Thus, the subtree (() () ()) is a four node subtree, one of which has 
degree three; and the subtree (((()))) is a four node subtree that has 
all nodes in line. 


4.3. Collecting subtrees 


A two dimensional array is used to save all lists of subtrees of height 
h which contain n nodes for each pair (n, h). Each list contains no 
duplicate trees. 


4.4, Uniqueness of the trees 


Paraffin trees are assembled from collections of subtrees which to- 
gether obey the following restrictions. The sum of the nodes in all 
subtrees is n. Excluding the special <()> at the boundary of the left 
margin, the height of the subtrees from left to right is nonincreasing 
and the first two have the same height. All possible combinations of 


312 J. Franco, et. al 


(define split-t 
(lambda (A 1 x n) 
(if (null? 1) 
une) 
(let* ({tree (car 1)] [h (1+ (height txree))]) 
(begin 
(when (< h (1+ (/ n 2))) 
(array-set! A x h (cons tree (array-ref A x h)))) 


(split-t A (edr 1) x n)))))) 


Figure 4 - Split n-node trees by height and save in array 


subtrees matching each height specification are joined to form 
individual paraffin trees. In the case that subtrees are the same height 
and the number of nodes is the same then the combination of subtrees 
is ordered to prevent paraffin tree duplicates. 


4.5. Rough description of the algorithm 


The algorithm has two phases. The objective of the first phase is to 
collect lists of unique trees of maximum degree four with root of 
maximum degree three, one list for every combination of tree height 
and size (in number of nodes). Each list is saved as an element of an 
array: the i, j location of the array contains the list of unique trees of 
size iand height j. The objective of the second phase is to sweep 
through the array assembling groups of subtrees which together rep- 
resent paraffin trees. 


The first phase is handled by procedure split-t (Figure 4) which 
is called from para. The input to split-t is a list 1 of trees each con- 
taining exactly n nodes and the number n. For each tree in 1, split-t 
determines its height h and adds it to the existing list of trees having 
nnodes and height h which is stored in the array at location (n, h). 
The input list 1 is obtained from procedure build-trees which is 


Scheme 313 


called from trees. The procedure takes as input the number n and 
produces a list of unique trees of maximum degree four, with maxi- 
mum root degree three. Each node in such trees has at most three 
children. Using each triple, a collection of right, middle, and left sub- 
trees are generated recursively as trees-a, trees-b, and trees-c 
within build-tree. The legal sizes of the subtrees are returned by the 
procedure sizer. 


The subtrees are joined by assemble to create a complete subtree. 
This procedure takes as inputs lists of subtrees (a,b,c,d), and opera- 
tors instantiated as cons or cdr. assemble groups sets of subtrees 
into a list representing another subtree where the children of the root 
are roots of the input subtrees. 


Upon completion of the first phase the array A contains all lists of 
trees that are needed to assemble the output trees. Within para pro- 
cedure done (Figure 5) is invoked four times to assemble the appropri- 
ate collections of trees from the array (actually using construct-trees 
in Figure 3). One call handles paraffin trees with an even number of 
nodes on its longest path. The other calls handle the cases where 
there is a center node in the longest path and it has two, three, and 
four neighbors. The procedure done retrieves from the array the ap- 
propriate combinations of subtrees to assemble into paraffin trees. 
The procedures height-list and sizer calculate the height and size 
specification of the paraffin trees. These procedures produce, from a 
height specification and a number of nodes n, a list of all triples of in- 
tegers summing to n- 1. The procedure sizer is used to select the 
sizes of the subtrees to assemble into paraffin trees. The numbers in 
the triples indicate the number of nodes in the right, middle, and left 
subtrees, respectively. Uniqueness of the trees produced by trees is 
ensured, in part, by the uniqueness of the triples with respect to 
height configuration. The procedure assemble, supervised by print- 
tree, assembles the paraffin trees, and the procedure bracket-parts 
handles the output format. 


314 J. Franco, et. al 


(define done 
(lambda (type A n hght-list size-list) 
(begin 
(let 
({height-a (car hght-list) ] 
(height-b (cadr hght-list)] 
(height-c (if (< (length hght-list) 3) 
0 (caddr hght-list hght-list))] 
{height-d (if (< (length hght-list) 4) 
0 (eadddr hght-list)))]) 
(map (lambda (y) 
(let ([size-a (car y)] 
[size-b (cadr y)] 
{size-c (if (< (length y) 3) 0 (caddr y))] 
[size-d (if (< (length y) 4) 0 (eadddr y))]) 
(let ([tree-a (array-ref A size-a height-a) ] 
[tree-b (array-ref A size-b height-b) ] 
{tree-c (array-ref A size-c height-c) } 
[tree-d (array-ref A size-d height-d) ]) 
(when (and (not (null? tree-a)) 
(not (null? tree-b)) 
(or (not (null? tree-c)) (< (length y) 3)) 
(or (not (null? tree-d)) (< (length y) 4))) 
(let ([f1 (if (and (= size-a size-b) 
(= height-a height-b)) edr id)] 
{f2 (if (and (= size-b size-c) 
(= height-b height-c)) edr id)] 
{£3 (if (and (= size-c size-d) 
(= height-c height-d)) cdr id)]) 
(print-tree type tree-a tree-b tree-c tree-d fl £2 £3)))))) 


size-list)) '()))) 


Figure 5 - Assemble trees from height lists, size lists, and 
subtree lists 


Scheme 315 


4.6. Problem with append 


A problem with this solution is the repeated use of the list opera- 
tion append. For one point, because most of the lists are intermediate 
data structures, its imperative counterpart append! could be used. 
However, even append! requires traversing its first argument. To cir- 
cumvent this overhead, one can maintain a tail pointer to each inter- 
mediate list and attach the second list by side-effect. Since this op- 
timization technique does not bring much conceptually to solving the 
paraffin problem, we only mention it and have refrained from actually 
implementing it. 


5. The Doctor-Patient Problem 


This simulation involves the interaction of several independent ac- 
tors. Scheme does not have the ability to deal with such computa- 
tional agents directly. Therefore we have built an event-manager for 
scheduling and initiating occurrences of sickness, patient releases, 
etc., based on information obtained directly from procedures that rep- 
resent the actors. The interaction of actors is simulated as invocations 
of corresponding actor procedures. Except for communications with 
the event-manager, invocations never return to their parent callers. 
This mimics real life: a person becomes sick, then proceeds to a re- 
ceptionist, then is treated by a doctor, then goes home and so on. 
This is possible in Scheme because all implementations of the lan- 
guage are required to treat tail-recursion as iteration. 


5.1. Principal program components 


The principal components of our solution to this problem are a pa- 
tient-maker, a doctor-maker, a receptionist, and an event-manager. 
Each procedure is described in detail below. The other minor rou- 
tines can be found in the Appendix. The patient-maker and the doc- 
tor-maker produce one instance of a patient procedure or doctor pro- 


316 J. Franco, et. al 


cedure for each patient and doctor. These procedures are anonymous. 
However, since first-class procedures are not only denotable values 
(i.e., bound to identifiers) but are expressible values as well (i.e., resul- 
ting from the evaluation of an expression), they can be passed to the 
event-manager and receptionist directly. The receptionist pairs 
patients and doctors. The event-manager schedules events and drives 
the simulation when an event occurs. 


5.1.1. The patient maker 


The code for the patient-maker is 


(define person-maker 
(lambda (name) 
(let ([this-time ‘infinity]) 
(lambda msg 
(case (lst msg) 
[ (name) name] 
[(time) this-time] 
[(time!) (set! this-time (2nd msg))] 
[else (error msg "Error in person-maker")]))))) 
(define patient-maker 
(lambda (name) 
(let ([person (person-maker name) }) 
(rec patient 
(lambda msg 
{case (lst msg) 
[(event) (receptionist 'enqueve-patient patient) ] 
[ (new-sick-patient) 
(begin 
(patient ‘time! (rand) ) 
(event-mgr 'record patient))} 


{else (apply person msg)])))))) 


Scheme 317 


Each patient procedure has three functions: to call the receptionist 
when the patient becomes sick, to compute a new time-to-sickness 
which is passed to the event-manager, and to give personal informa- 
tion to procedures which request it. The choice of function is deter- 
mined by a message which is passed to the procedure as an argument. 
Local to a patient procedure are a patient’s name and time-to-sick- 
ness. The patient's name is specified in a list of patient names that is 
supplied by the user as an argument to the startup procedure called 
doctors-patients-problem. The initial time-to-sickness is com- 
puted within the startup procedure and passed to the event~manager. 
One special procedure named stop-patient is given a time-to-sick- 
ness which corresponds to the duration of the simulation. When it is 
the turn of stop-patient to become “sick”, the simulation terminates. 


5.1.2. The doctor maker 


The code for the doctor-maker is 


(define doctor-maker 
(lambda (name) 
(let ([treated '*] [person (person-maker name) ]) 
(rec doctor 
(lambda msg 
(case (lst msg) 
[ (event) 
(begin 
{treated 'new-sick-patient) 
(receptionist 'release-patient doctor treated) )] 
{ (assign-patient) 
(begin 
{doctor ‘time! (rand) ) 
(set! treated (2nd msg) ) 
(event-mgr ‘record doctor) )] 


{else (apply person msg)])))))) 


318 J. Franco, et. al 


Each doctor procedure has three functions: to call the receptionist 
when the patient he/she is paired with is about to be released, to 
compute a new time-to-release which is passed to the event-man- 
ager, and to give personal information to requesting procedures. A 
message chooses the function. Local to a doctor procedure are a doc- 
tor’s name and time-to-release. The name of a doctor is specified in 
a list of doctor’s names which is an argument to the startup proce- 
dure. 


5.1.3. The receptionist 


The code for the receptionist is 


(define receptionist 
(let ([patient-Q (queue)] [doctor-list (queue) ]) 
(let ([p-empty? (patient-Q 'empty?)] 
[d-empty? (doctor-list '‘empty?) ] 
[p-dequeue! (patient-Q '"dequeue!) ] 
[d-dequeue! (doctor-list 'dequeue!) ] 
[p-enqueue! (patient-Q 'enqueue!)] 
[d-enqueue! (doctor-list 'enqueue!) ]}) 
(letrec ([treat-waiting-patients—if-docs-available 
(lambda () 
(when (and (not (p-empty?)) (not (d-empty?))) 
(let ([doctor (d-dequeue!)] [patient (p-dequeue!)]) 
(begin 
(doctor 'assign-patient patient) 
(SHOW-PAIR-DOC-PAT doctor patient) 
treat-waiting-patients-if-docs-available)))))]) 
(lambda msg 
(begin 
(case (lst msg) 
[ (enqueue-patient) 


(let ([patient (2nd msg) }) 


Scheme 


(begin 
(SHOW-PERSON-SICK patient) 
(i£ (d-empty?) 
(p-enqueue! patient) 
(let ([doctor (d-dequeue!)]) 
(begin 


(doctor ‘assign-patient patient) 


(SHOW-PAIR-DOC-PAT doctor patient))))))] 


{(release-patient) 


{let ({doctor (2nd msg)] [patient (3rd msg)]) 


(begin 
(SHOW-RELEASE patient doctor) 


(d-enqueue! doctor) 


(treat-waiting-patients-if-docs-available) ))] 


[(init) (for-each d-enqueue! (2nd msg))]) 


(event-mgr 'next-event))))))) 


319 


The receptionist does the bookkeeping necessary for doctor-patient 
pairing and release. A doctor-list and patient-queue are used to 
facilitate this bookkeeping. When no further pairings or release are 
possible at a particular epoch, the receptionist calls the event-man- 


ager to initiate the next event. 


5.1.4. The event manager 


The code for the event-manager is 


(define event-mgr 
(let (fevent-Q (queue) J) 

(let ([insert! (event-Q 'insert!)] 
[dequeue! (event-Q 'dequeue!)] 
[update-all! (event-Q 'update-all!)]) 

{lambda msg 


(case (lst msg) 


320 J. Franco, et. al 


[(record) (insert! (2nd msg) 'time) } 
[ (next-event) 
(let ([person (dequeue!) ]) 
(begin 
(update-all! 
(let ({next-time (person 'time) }) 
(lambda (p) (p ‘time! (- (p 'time) next-time))))) 
(unless (eq? person stop-patient) 


(person ‘event))))]))))) 


The event-manager maintains a list of events and the times at which 
they will occur in the order they are due to occur. The event-manager 
has two functions: to record future events in the event-list and to in- 
voke the first procedure in the event-list when the next event is due 
to occur. A call from a doctor or patient procedure results in that 
procedure being placed in the event-list according to a time that ac- 
companies the call. A doctor procedure in the event-list represents 
the future event that his/her patient is released. A patient procedure 
in the event-list represents the future event that the patient be- 
comes ill. The event-manager immediately returns after recording an 
event. When called from the receptionist, the event-manager in- 
vokes the first procedure in the event-list. 


5.2. Control Flow 


Initially, the startup procedure doctors-patients-problem is in- 
voked with a list of patient names and a list of doctor names as argu- 
ments. A stop-patient is given a time and it is recorded in the 
event-list by the event-manager. Then, for each patient name in the 
patient-list, a patient procedure is created and used to record itself 
and its first time-to-sickness in the event-list. A call to the recep- 
tionist to set up its doctor-list completes the startup procedure. 
The second argument to receptionist in this case is a list of doctor 


Scheme 321 


procedures, one for each doctor in the list of doctor names. The re- 
ceptionist places these doctor procedures in its doctor-list. 


The simulation proceeds as a round-robin of procedure calls. All 
calls to the receptionist result in a call to the event-manager. When 
called from the receptionist, the event-manager invokes the first 
procedure in the event-list. If it is a patient procedure the patient 
has become sick. Thus, the receptionist is invoked with instructions 
to pair the patient with an available doctor when one becomes avail- 
able. If it is a doctor procedure the patient being treated by that doc- 
tor has been released. Thus, the paired patient procedure is invoked 
with instructions to pass its next time-to-sickness to the event-man- 
ager (which records the procedure and its time-to-sickness in the 
event-list) and the receptionist is invoked with instructions to 
make the doctor available. If the patient-queue is not empty, the re- 
ceptionist pairs available doctors with as many waiting patients as 
can be accomodated. Each pairing results in a call to the paired doc- 
tor procedure which computes a time-to-release that is recorded 
along with the doctor procedure by the event-manager in the event- 
list. When the receptionist finishes all possible new pairings, it 
calls the event-manager again and the cycle repeats. 


5.3. Queue control 


In order to support the doctor-list, patient-queue, and event- 
list operations, we have included a queue-making procedure called 
queue (see the Appendix). When invoked, this procedure returns an 
empty queue and all the functions needed to maintain it. Access to 
each of these functions is by means of another call with an argument 
specifying it. For example, consider the following sequence 


322 J. Franco, et. al 


(let ({event-Q (queue) ]) 
(let ([dequeue! (event-Q 'dequeue)] ...) 


(let ({person (dequeue!) ]) 


(let ({next-time (person 'time)}) 


oe ed))) 


The variable event-Q is a procedure of one argument with local state 
corresponding to pointers front and rear, both initialized to nil. de- 
queue! is a procedure of no arguments that removes the first item 
from the queue and returns its value. person is an item that has been 
dequeued; it is either a doctor procedure or a patient procedure. 
next-time is a time that has been returned by person. 


Additional “queue” functions are as follows. enqueue! is a proce- 
dure of one argument called item that is efficiently appended to the 
queue by means of the rear pointer. empty? is a procedure of no ar- 
guments that returns true if and only if the queue is empty. insert! is 
a procedure of two arguments called item and field that is used by 
the event-manager to put item into the queue at a place that keeps the 
queue in increasing order on field. item is assumed to be a proce- 
dure (a doctor or a patient) and field is atime. update-all! is a pro- 
cedure of one argument called proc that is applied to each element in 
the queue. It is used by the event-manager to reduce the times asso- 
ciated with procedures in the event-queue when a new event occurs. 


6& The Skyline-Matrix Problem 


We solve this problem by triangularizing the matrix from the lower 
right corner to the upper left corner and then collecting terms from 
the upper left corner to the lower right corner. The elements of the 
b-vector are adjusted separately while triangularizing. This is only be- 


Scheme 323 


cause it looks more pleasing visually. We pay little penalty for this in 
extra code since it is easy in Scheme to abstract the operations of 
adding two rows and two scalars to one procedure. Triangularization 
and b-vector adjustment is accomplished by the procedure skyline 
(Figure 6) which zeros all matrix elements of the rightmost matrix 
column by adding multiples of the last row to higher rows containing 
non-zero elements in that column and recurses on a matrix and b-vec- 
tor that is one variable smaller. Procedure result (Figure 6) does the 
collecting. 


6.1. Input specification 


The input to skyline is a row-list of lists of non-zero row ele- 
ments, a row-skyline list, a column-skyline list, and a b-list. The 
order of the rows in the row-list is from bottom to top and the order 
of elements in each list is reversed. The elements of the row-skyline 
list specify the left non-zero boundaries of the rows, in order, from 
bottom to top. The elements in the column-skyline list specify the 
top non-zero boundaries of the columns in reverse order. The b-list 
is the reverse of the b-vector. Thus, the following input 


(let ([row-list '((4 2 6) (8 6 1 16) (2 10) (12 14))] 
[row-skyline '(2 1 2 1)] 
{column-skyline '(3 1 2 1)] 
[b-list '(10 4 2 1)]) 

oe) 


represents the following linear system 


° 

- 

° 
ne 
Novy, 
emRoOOO 


324 J. Franco, et. al 


(define skyline 
(lambda (row-list row-skyline col-skyline b-list n) 
(if (null? row-list) 
uae) 
(let* ([row-len (- (addi n) (car row-skyline))] 
[col-len (- (addi n) (car col-skyline))] 
{z ((multiplier-list (caar row-list) ) 
(subl col-len) (¢edr row-list))] 
(new-col-sky ((fix-skyline (car col-skyline) ) 
row~-len col-skyline) ] 
{(new-row-sky ((fix-skyline (car row-skyline) ) 
col-len row-skyline) J] 
[new-row-lst ((new-row (car row-list) col-skyline) 
(subl col-len) (edr row-list) z)] 
{new-b-list (cons (car b-list) 
((new-b-lst (car b-list)) 
(subl col-len) (edr b-list) z)))) 
(cons (car row-list) 
(cons (car new-b-list) (skyline new-row-lst 
(edxr new-row-sky) 
(edr new-col-sky) 
(edr new-b-list) 


(subl n)))))))) 
(define result 


(lambda (s-list) 
((rec loop 
(lambda (sol s-list) 
(i£ (null? s-list) 
sol 
(let ([sum-prod (dot-prod (cedadzr s-list) sol)]) 
(leop (cons (/ (- (car s-list) sum-prod) 
(caadr s-list)) sol) 
(edr (cdr s-list))))))) 


"() (reverse s-list)))) 


Figure 6 - The main procedures for the Skyline Matrix Problem 


Scheme 325 


6.2. Output specification 


The output of skyline is a list which contains alternations of lists of 
numbers and single numbers. The single numbers are b-list ele- 
ments; they may have changed due to row addition. The lists are con- 
secutive matrix elements, in reverse order, at and to the left of the di- 
agonal; they also may have changed due to row addition. From right to 


left, the b-list element of the last row is followed by the diagonal 
element (as a singleton list) of the last row. Then comes the changed 
b-list element of the next-to-last row and the list of non-zero ele- 
ments of the next-to-last row and so on. 


Procedure resuit recursively builds a solution vector X from input 
supplied by skyline. Suppose sol is a partial solution covering the 
last ivariables of X. We extend sol to a partial solution covering the 
last i+ 1 variables of X as follows: take the dot-product of sol and all 
but the first matrix element of the (i+ 1)st element list (counting from 
the right) and call it sumprod; let the first element of the list be called 
diagelem and let the (i + 1)5¢ single number be called belem. Then the 
(i+ 1)st variable from the bottom has the value (belem - sumprod) / 
diagelem. Procedure result recursively builds a solution vector in this 
fashion. 


6.3. Description of the procedures 


Here we present a brief description of the procedures used in our 
solution of the Skyline Matrix problem. The exact code is given in the 
Appendix. skyline recursively builds a list of matrix-elements and b- 
list pairs from successively smaller submatrices and b-lists. This re- 
quires computing the following new arguments for the recursive call 
to skyline: a row-list, a row-skyline, a column-skyline, and a b- 
list. Of these four, both b-list and row-list require a list of mul- 
tipliers, one for each non-zero matrix element above the lower right 


326 J. Franco, et. al 


diagonal element, which will be used when adding the last row to the 
rows above it. 


The muitiplier-list is built by the procedure multiplier-list. 
The size of the list is determined from the first element of the column 
skyline. Let q be the lower right matrix element and let rj, re,...be 
the non-zero elements in the rightmost column in order above q. 
Then the multiplier-list produced is (-r; / q, -re/q,...). When 
the last row is multiplied by -r; / qand added to the ith row above the 
last, the rightmost element of that row is zeroed. The call to multi- 
plier-list in skyline is in the let*. 


Both new row-list and b-list are similarly built from the multi- 
plier-list and the old row-list and b-list. The difference is that, 
in the case of the row-list, lists are added whereas, in the case of the 
b-list, scalars are added. We abstract out the common part and call 
it abstract-1. This procedure takes another procedure f as argument 
and returns a procedure which takes two lists, list-1 and list-2, and 
a number n as arguments. When invoked, the returned procedure 
walks down both list-1 and list-2 for a number of steps specified by 
n, applies f to each pair of list-1 and list-2 elements encountered, 
and returns a list of results concatenated with the remainder of list- 
1 


The procedure new-b-1lst builds a new b-list by invoking ab- 
stract-1 with f a procedure which multiplies a b-value by an ele- 
ment of list-2 (list-2 is the multiplier-list) and adds the result 
to an element of list-1 (list-1 is the old b-list). 


The procedure new-~row builds a new row-list invoking abstract-1 
with £ a procedure which applies the procedure returned by add-row 
to an element of list-2 (again, the multiplier-list) and an element 
of list-1 (now a list of rows). The procedure returned by add-row 
takes the product of a multiplier and all elements of the first row in 
the old row-list and adds the resulting list to the elements of a row 


Scheme 327 


in the rest of the row-list. Since elements of a row in the row-list 
may be separated by zeros in the given matrix, the column-skyline 
must be used to determine the proper alignment of row elements 
when performing the row additions. Hence the old column-skyline is 
an argument to this procedure. The procedure does not do anything 
special in the unlikely event that a new element in a row is zero; such 
an element is recorded as a zero. 


It remains to show how the new column-skyline and row-skyline 
are built. The procedure for fixing both is fix-skyline. This proce- 
dure adjusts elements of the old column-skyline or row-skyline 
which are associated with columns or rows affected by the most re- 
cent set of row additions. Since new zero elements are recorded as 
such, this procedure only sets each new column-skyline or row- 
skyline element to the minimum of its old value and the value of the 
column-skyline or row-skyline element corresponding to the zeroed 
column. 


7. Conclusions and Issues 


As illustrated in this chapter, Scheme is an essentially simple, 
flexible, and orthogonal language. Using the lambda tool, we have 
solved a series of problems based on the versatile features of Scheme 
and illustrating streams, eagerness vs. laziness, procedures as first- 
class objects, programming actors and programming with actors, and 
the controlled use of side-effects. 


Still this chapter does not illustrate the full potential of program- 
ming with Scheme, most remarkably regarding continuations. To this 
effect, we kindly direct the interested reader to the Scheme literature 
as regularly listed in the ACM publication Lisp Pointers. 


328 J. Franco, et. al 


Acknowledgements 


Dr. John Franco was supported by the Air Force Office of Scientific 
Research under grant AFOSR 89-0186 and the FAW Ulm, Germany. 
Dr. Daniel Friedman was supported by the National Science Foundation 
under grant CCR 87-02117 and by the Air Force Office of Scientific 
Research under grant AFOSR 89-0186. Dr. Olivier Danvy contributed 
to this work during the Fall 1989 semester while visiting the 
Computer Science Department of Indiana University. 


References 


1. Dybvig, R. K. The Scheme Programming Language. Prentice-Hall, 
Englewood Cliffs, NJ, 1987. 


2. Kohlbecker, E. E. Syntactic Extensions in the Programming 
Language Lisp. Ph.D. Dissertation, Department of Computer 
Science, Indiana University, Bloomington, ID, 1986. 


3. Rees, J. and Clinger, W. C. (eds.). Revised Report on the Algori- 
thmic Language Scheme. Sigplan Notices, 21(12), December 
1986, pgs. 37-79. 


4. Sussman, G. J. and Steele, G. L. Scheme: an interpreter for 
exteded lambda calculus. Technical Report, Artificial Intelligence 
Laboratory, Masschusetts Institute of Technology, Cambridge, MA, 
1975. 


Scheme 329 


Appendix 


KRRKKKKKKKKEKKEKKKKEKE 


xxk*k Paraffins **** 
KREKKKKKAKKEKKKKEKKKKK 


(define map 
(lambda (p 1) 
((rec loop (lambda (1) 
(unless (null? 1) 
(begin 
(p (car 1)) 
(loop (cdr 1)))))) 1))) 


(define combine 
(lambda (x y f) 
(if (or (null? x) (null? y)) 
©) 


(append (map (lambda (z) (append (list (car x)) z)) y) 
(combine (cdr x) (£ y) £))))) 


(define assemble 
(lambda (a b c da f-a f-b f-c) 
(cond 
[(null? a) '()) 
[(and (null? b) (null? c) (null? d)) 
(map (lambda (y) (list y)) a)] 
[else (let ([argb (if (null? b) b (f-a b))] 
{lst (assemble b c ad '() f-b f-c f-c)]) 
(append (combine (list (car a)) lst f-a) 
(assemble (cdr a) argb c dad f-a f-b f-c)))]))) 


(define bracket-parts 
(lambda (1) 
(i£ (null? 1) 
ue) 
(begin 
(display '<) (display (car 1)) 


(display '>) 
(bracket-parts (cdr 1)))))) 


(define print-tree 
(lambda (type a b cd f-a f-b f~c) 
(let ({assem-list (assemble a b c d f-a f-b f-c)]) 
(if (null? assem-list) 
ze) 
(map (lambda (1) 
(begin 
(when (eq? type 'odd) 
(display "<()>")) 

(bracket-parts 1) 


330 J. Franco, et. al 


(newline) )) 
assem-list))})))} 


(define list-max 
(lambda (1) 
(if (null? 1) 0 (max (car 1) (list-max (cdr 1)))))) 


(define height 
(lambda (1) 
(if (mull? 1) 0 (1+ (list-max (map height 1)))))) 


(define last-items 
(lambda (1) 
(if (= 2 (length 1)) 1 (last-items (cdr 1))))) 


(define sizer 
(lambda (n x h-spec) 
(let* ({items (last-items h-spec) } 
[next-last (car items) ] 
{last (cadr items) } 
[new-h-spec (reverse (cdr (reverse h-spec)))]) 
(if (= (length h-spec) 2) 
(if (< x n) 
‘Q) 
(cons (cons x (list n)) (sizer (14+ n) (1- x) h-spec))) 
(let ({bnd (if (= last next-last) n 1)]) 
(if (< x 1) 
‘oO 
(append (map (lambda (y) (append y (list n))) 
(sizer bnd (- x bnd) new~-h-spec) ) 
(sizer (1+ n) (1- x) h-spec)))))))) 


(define id (lambda (x) x)) 


(define array-ref 
(lambda (A nm) (vector-ref (vector-ref An) m))) 


(define array-set! 
(lambda (A nm val) (vector-set! (vector-ref An) m val))) 


(define create-array 
(lambda (n) 
(let ([f (decr-ord-list (1 + n))]) 
(apply vector (map (lambda (x) (make-vector (1+ n) '())) £))))) 


(define incr-ord-list 
{lambda (n) 
((rec up (lambda (acc x) (if (zero? x) 
ace (up (cons x acc) (1 - x))))) 
"dQ n))) 


Scheme 331 


** (decr-ord-list 6) -> (65 43 21 0) 
(define decr-ord-list ; 
(lambda (n) 
(if (zero? n) '(0) (cons n (decr-ord-list (1- n)))))) 


** (decr-ord-pair 4) -> ((3 3) (3 2) (3 1) (3 0) (2 2)... 
(define decr-ord-pair ; 
(lambda (n) 
(if (zero? n) 
(list (list 0 0)) 
(append (map (lambda (x) (list n x)) (decr-ord-list n)) 
(decr-ord-pair (1 ~- n)))))) 


(define decr-elements 
(lambda (n) 
(map list (decr-ord-list n)))) 


(define height-proc 
(lambda (n h) 
(cond 
[(< n 0) 
"Q] 
[(= h 2) 
(cons (list nn) (height-proc (1 - n) h))] 
[else 
(let ([map-list (if (= h 3) (decr-elements n) (decr-ord-pair n))]) 
(append (map (lambda (y) (append (list nn) y)) map-list) 
(height-proc (1 - n) h))))]))) 


(define get-trees (lambda (Bn) (vector-ref B n))) 


(define build-trees 
{lambda (B 1) 
(if (null? 1) 
‘0 
(let* ([subtree-config (car 1)] 
{nodes-c (car subtree-config) ] 
{nodes-b (cadr subtree-config) ] 
[nodes-a (caddr subtree-config) ] 
[trees-c (get-trees B nodes-c) } 
[trees-b (get-trees B nodes-b) ] 
[trees-a (get-trees B nodes-~a) ] 
[op1 (if (= nodes-a nodes-b) cdr id)] 
[op2 (if (= nodes-b nodes-c) cdr id)]) 
(append (assemble trees-c trees-b trees-a '() op2 opl id) 
(build-trees B (cdr 1))))))) 


(define trees 
(lambda (B n) 
(cond 
[(= 1 n) (list '())) 


332 J. Franco, et. al 


[(zero? n) '()] 
[else (build-trees B (sizer 0 (1- n) '(1 1 1)))]))) 


KHKKKIKKKEKKKKKKKEKKKKEKKKEK EK 


**kkk Doctor's Office **** 
KAEKKKKEKKAKKKKKKKKKKKKEKEK 


(define stop-patient (patient-maker 'stop)) 


(define for-each 
(lambda (p 1) 
((rec loop (lambda (1) 
(unless (null? 1) 
(begin 
(p (car 1)) 
(loop (cdr 1)))))) 1))) 


(define doctors-patients-problem 
(lambda (patients doctors) 
(begin 
(stop-patient 'time! 100) 
(event-mgr ‘record stop-patient) 
(for-each (lambda (p) ((patient-maker p) 


"new-sick-patient)) patients) 
(receptionist ‘init (map doctor-maker doctors))))) 


(define test 
(lambda () 
(doctors-patients-problem 
' ("John" "Jim" "Joe" “"Joy") 1’ ("Grace" "Sam") ) ) ) 


(define queue 
(lambda () 
(let ([front '()] [rear '()] [secure (lambda (expr) 'done)]) 
(letrec ([enqueue! 
(lambda (item) 
(secure (let ([list-of-item (cons item '())]) 
(if (null? front) 
(begin (set! rear list-of-item) 
(set! front list-of-item) ) 
(begin (set-cdr! rear list-of~item) 
(set! rear list-of-item))))))] 
[update~all! (lambda (proc) (for-each proc front))] 
{empty? (lambda () (null? front))] 
[dequeue! 
(lambda () 
(if (null? front) 
(begin (write "The queue is empty.") (newline) ) 
(let ([item (car front)]) 


Scheme 333 


(begin (set! front (cdr front)) 
(when (null? front) (set! rear '())) 
item))))] 
{insert! 
(lambda (item field) 
(let ([this-time (item field) ]) 
(if (or (null? front) (< this-time ((car front) field))) 
(if (null? front) 
(enqueue! item) 
(let ([c (cons item front) ]) 
(set! front c))) 
({rec loop (lambda (before after) 
(cond 
{(null? after) (enqueue! item) ] 
[(< this-time ((car after) field)) 
{let ([c (cons item after)]) 
(set-cdr! before c))] 
[else (loop after (cdr after)}]))) 
front (cdr front)))))]) 
(lambda (msg) 
(case msg 
[ (update-all!) update-all!) 
[(enqueue!) enqueue!] 
[ (dequeue!) dequeue!] 
{({insert!) insert!] 
[(empty?) empty?] 
{else (error "Bad command to queue:" msg)])))))) 


(define rand 

(let ([q 1]) 

{lambda () 
(if (eq? q 15) (set! q 1) (set! q (+ q 1))) q))) 


(define SHOW-RELEASE 

(lambda (patient doctor) 

(begin 
(display "Dr. ") 
(display (doctor 'name)) 
(display " releases ") 
(display (patient 'name) ) 
(display ".") 
(newline) ))) 


(define SHOW-PERSON-SICK 
(lambda (patient) 
(begin 
(display (patient 'name) ) 
(display " has just gotten sick."™) 
(newline) ))) 


{define SHOW-PAIR-DOC-PAT 


334 J. Franco, et. al 


(lambda (doc patient) 
(begin 
(display "Dr. ") 
(display (doc 'name)) 
(display " treats ") 
(display (patient 'name) ) 
(display ". Time to treatment end is ") 
(display (doc 'time) ) 
(display ".") 
(newline) ))) 


(define lst car) 
(define 2nd cadr) 


(define 3rd caddr) 


KKKKKKEKEKKEKKKEKKEKKKKKKEKKK 


kkk* Skyline Matrix **** 
KKEKKKEKKEREKKKKKKKKKKKKKK 


(define stop-patient (patient-maker 'stop)) (define dot-prod 
(rec alpha (lambda (A B) 
(if (null? A) 
0 
(+ (* (car A) (car B)) (alpha (cdr A) (cdr B))))))) 


(define fix-skyline 
(lambda (m) 
(rec alpha (lambda (n lst) 
(if (zero? n) 
ist 
(cons (min m (car lst)) 
(alpha (subl n) (cdr lst)))))))) 


(define multiplier-list 
(lambda (q) 
(rec alpha (lambda (n lst) 
(1£ (zero? n) 
‘0 
(cons (- 0 (/ (caar lst) q)) 
(alpha (sub1l n) (cdr lst)))))))) 


(define add-row 
(lambda (row-numb mult) 
(rec alpha 
(lambda (t-row col old-row) 
(if (null? t-row) 
old-row 


Scheme 335 


(if (or (> (car col) row-numb) (null? old-row)) 
(cons (+ (* mult (car t-row)) 0) 
(alpha (cdr t-row) (cdr col) old-row)) 
{cons (+ (* mult (car t-row)) (car old-row)) 
{alpha (cdr t-row) (cdr col) (cdr old-row))))))))) 


(define abstract-1 
(lambda (f£) 
{rec alpha (lambda (n list-1 list-2) 
(if (zero? n) 
list-1 
(cons (f n (car list-1) (car list-2)) 
(alpha (subl n) (cdr list-1) (cdr list-2)))))))) 


(define new-b-lst 
(lambda (b-value) 
(abstract-1 (lambda (i zx) (+ (* x b-value) z))))) 


(define new-row 
(lambda (a col-skyline) 
(abstract~1 (lambda (i ol ml) 
{cdr ((add-row (+ (subl i) (car col-skyline)) ml) 
a col-skyline o01)))))) 


(define main 
(lambda () 
(let (f[rows '((4 2 6) (8 6 1 16) (2 10) (12 14))] 
{row-sky '(2 1 2 1)] 
[b-list '(10 4 2 1)] 
[col-sky '(3 1 2 1)}) 
(let ({s-list (skyline rows row-sky col-sky b-list (length rows) )]) 
(begin 

(display s-list) (newline) (newline) 
(result s-list)))))) 


A Comparative Study of Parallel Programming Languages: 
The Salishan Problems / J.T. Feo (Editor) 
© 1992 Elsevier Science Publishers B.V. All rights reserved. 337 


Sisal 


John Feo 
Computing Research Group, L-306 
Lawrence Livermore National Laboratory 
Livermore, CA 94550 


L Introduction 


Sisal, Streams and Iterations in a Single Assignment Language, a 
derivative of Val, was defined in 1983 [4] and revised in 1985 [5]. 
Since 1985 the language definition has remained constant providing a 
stable testbed for programming language research and functional pro- 
gram development. The Sisal Language Project began as a collabora- 
tive effort between Lawrence Livermore National Laboratory, Colorado 
State University, University of Manchester, and Digital Equipment 
Corporation. Today, only LLNL and CSU continue to develop and pro- 
mote the language; however, Sisal and its intermediate form IF1 [7] 
are being used by research groups across the United States and around 
the World. 


The Project has six objectives: 


1. to define a general-purpose functional language, 


2. to define a language-independent intermediate form for 
dataflow graphs, 


3. to develop optimization techniques for high perfor- 
mance parallel applicative computing, 


4. to develop a microtasking environment that supports 
dataflow on conventional computer systems, 


338 J. T. Feo 


5. to achieve execution performance comparable to imper- 
ative languages, and 


6. to validate the functional style of programming for 
large-scale scientific applications. 


The objectives emphasize usability and performance. Goals four, five, 
and six set the Sisal effort apart from other functional language pro- 
jects. They reflect the computing environment at Lawrence Livermore 
National Laboratory and other government facilities. 


Functional languages promote the development of correct, deter- 
minate parallel programs. Functional programs are free of aliases, side 
effects, and time dependent errors. Results are determinate regard- 
less of architecture, operating system, or execution environment. 
Unlike parallel imperative languages, functional languages decrease 
the programming burden. Users can define only what is to be com- 
puted and can encode only the data dependencies among operations. 
The compiler and the run time system are responsible for scheduling 
operations, communicating data values, synchronizing operations, and 
managing memory. It is as easy to write a functional program, which 
is implicitly parallel, as it is to write a sequential imperative program. 
Relieved of parallel programming’s most difficult chores, the user is 
free to concentrate on algorithm design and application development. 


Section 2 introduces Sisal, discussing those language features used 
in subsequent sections. For the full language definition see [5]. In 
Sections 3, 4, 5, and 6 we present solutions to the four Salishan pro- 
blems. Since functional programming requires a different thought 
process than imperative programming, we describe how we formu- 
lated each solution. In Section 7, we conclude with some general 
remarks regarding Sisal and functional programming. 


Sisal 339 


2. Language Definition 


Sisal is a strongly typed, general purpose functional language that 
supports data types and operations for scientific computing. To min- 
imize learning time and enhance readability, the creators of Sisal 
adopted a Pascal-like, block syntax delimited by keywords. Since most 
scientific programs are written once but maintained over many years, 
the designers sought to improve readability wherever possible. Sisal 
programs are slightly wordy, but easy to read and to understand. 
Some language features exchange elegance for readibility. 


Sisal has several important semantic properties. First, the language 
is mathematically sound—functions map inputs to outputs without side 
effects. Second, names are referentially transparent; that is, they 
stand for values rather than memory locations. Third, the language is 
single-assignment. A name may be assigned a value only once within 
each scope. A Sisal tenet, not required by its functional semantics but 
enforced by the compiler to aid readability, is that all names must be 
defined before they are used. 


2.1. Types 


Sisal supports the standard scalar data types: boolean, char, in- 
teger, real, and double _ real. It also includes the aggregates: ar- 
ray, record, stream, and union. Users may define aggregate types 
by using the type statement. For example, 


type Istr = stream [integer]; 
type OneR = array [real]; 
type TwoR = array ([OneR]; 


define, respectively, a stream of integers, an array of reals, and an ar- 
ray of arrays of reals. The latter is equivalent mathematically to a two- 
dimensional array of real values. 


340 J. T. Feo 


In Sisal both arrays and streams are homogeneous aggregates of any 
standard or user-defined type. Arrays support random access whereas 
stream elements are available only in FIFO order. That is the i-th ele- 
ment of a stream must be consumed before the (i+ 1)-st element may 
be consumed. Consequently, Sisal streams cannot deadlock. Array 
declarations include neither size nor bounds information. An array’s 
size, lower bound, and shape are determined during execution. Since 
the components of a multi-dimensional array are arrays, each may have 
a different length and lower bound. We say that Sisal’s arrays are 
ragged. 


The types of names are not declared; instead, the compiler infers 
the type of each name from the surrounding context. The two excep- 
tions are the formal parameters and results of functions. These too 
could be inferred, but at the expense of readability. The types are de- 
clared in the function headers. 


2.2. Functions 


A function can take zero or more arguments and must return one or 
more values. The type of each formal parameter and result value is 
declared in the function header. For example, the function circle 


function circle(radius: real returns real, real) 
2.0 * 3.14 * radius, 3.14 * radius * radius 


end function 


takes one formal parameters radius, of type real, and returns two 
real values, the circumference and area of the circle. The number of 
values a function or an expression returns is referred to as the arity of 
the function or expression. A function has access to only its argu- 
ments. There are no global values and functions do not retain state be- 
tween invocations. The effect of invoking a function is limited to the 
values it returns—there are no side effects. 


Sisal 341 


A function name and list of actual parameters can appear anywhere 
an expression of the same type and arity can appear. For example, the 
statements 


circum, area := circle(radius); 
and 
a, b := new_function (circle (radius) ) ; 


are legal statements provided new_function takes two real values as 
input and returns two results. But the statement 


circum := circle (radius) 


is illegal because there is only a single name on the left-hand side. 


2.3. Let expressions 


The let expression defines a set of names, and then uses the 
names to compute one or more values. The expression, 


let 

% the let clause 
in 

% the in clause 
end let 


consists of two clauses: a let clause and an in clause. The latter defines 
the arity, type, and value of the let expression. For example, the ex- 
pression 


2.0 * pi * radius, pi * radius * radius 


end let; 


342 J. T. Feo 


has arity two, type (real, real), and returns the circumference and 
area of a circle. The let expression is equivalent to the function cir- 
cle defined before and could replace the two expressions in the func- 
tion’s body. 


2.4. For expressions 


The for expression, 


for <range generator> 
<loop body> 
returns <returns clause> 


end for 


is the parallel loop form in Sisal. It has three parts: a range generator, 
a loop body, and a returns clause. 


The range generator is a dot or cross product of a set of sequences 
or scatters (see Figure 1). An instance of the loop body is executed for 
each index, value, or n-tuple of the range. The range generator speci- 
fies the order of reduction, and defines the size and structure of any 
generated aggregate object. For example, the expression 


for iin 1,n cross j in 1,m 
returns array of (i + 4) 


end for 


returns a two dimensional array of nrows and mcolumns. At first, 
many Sisal programmers fail to understand the subtleties of this syn- 
tax. A common mistake is to write the transpose of an (nx m) matrix 
as 


for iin 1,n cross j in i,m 
returns array of X{j, i] 


end for 


Sisal 343 


Range Generator Comments 


in A a scatter 
iini,n a sequence 
in A dot y in B a dot product of two scatters 


i in 1, n cross j inl, n a cross product of two sequences 


Figure 1 - Forms of the Range Generator 


But this returns an (n x m) matrix and not an (mx n) matrix. The cor- 
rect expression is 


for iin 1,m cross j in 1,n 
returns array of X[j, i] 


end for 


The loop body is a set of name definitions. Sisal’s semantics pre- 
vents an instance of the loop body from referencing values computed 
by any other instance. Thus, the instances of the loop body are data 
independent and may be executed in parallel. 


The returns clause defines the arity, type, and value of the for ex- 
pression. Each result is a reduction of values defined in the loop body. 
The order of reduction is determinate and equivalent to the sequential 
execution of the loop bodies. Figure 2 lists the eight reduction opera- 
tions supported by Sisal. The values contributing to a reduction can be 
filtered by including a when clause. For example, 


array of x when x > 0 


returns an array of only positive values. If none of the values are posi- 
tive, an empty array of the same type as x is returned 


As an illustration of the for expression, consider matrix multipli- 
cation. Let a and b be real arrays of size {n x m) and (mx n), respec- 
tively, then the product of a and b is 


344 J. T. Feo 


Returns Clause Comments 


value of x returns the last value of x 
array of x returns an array of x values 
stream of x returns a stream of x values 
value of sum x returns the sum of x values 
value of product x returns the product of x values 
value of least x returns the smallest x value 


value of greatest x returns the largest x value 


value of catenate x returns the array formed by catenating 
the x values (note x must be an array) 


Figure 2 - The Returns Clause 


m 
¥ ali, k) *b(k, jf), lsijesn (2.1) 
k=] 


We begin by writing 


function mmul(n,m: integer; a,b: TwoR returns TwoR) 


end function 


which defines mmul as a function of four inputs: n and m of type inte- 
ger, and a and b of type Twor (defined previously). The function re- 
turns a single value of type Twor, the product of a and b. 


The function's body is the Sisal expression for Equation 2.1. Since 
the equation computes n2 independent real values and assembles them 
into an array, we want to use the following for expression 


for iin 1, n cross j in 1, n 
Cij := 
returns array of Cij 


end for 


Sisal 345 


Cij is the inner product of the th row of a and j-th column of b. We 
write the inner product of two m-element vectors as 


for k in 1, m 
returns value of sum a[i,k] * b[(k,j] 


end for 
Putting everything together, we have 


function mmul(n,m: integer; a,b: TwoR returns TwoR) 
for iin 1, n cross j} in 1, n 
Cij := for k in 1, m 
returns value of sum a[i,k) * b{k,j] 
end for 
returns array of Cij 
end for 


end function 


There are three important observations to make. First, the func- 
tion expresses only the necessary mathematical computations. The 
implementation of the function, which is 100% parallel, is unspeci- 
fied. Second, the function is dynamic. The resources necessary to ex- 
ecute the expression depend entirely on n and m. There is no static 
allocation of memory or processors. Third, unlike an imperative func- 
tion which would first allocate the memory for the result and then fill 
it in, the Sisal function consists of a single expression that both cre- 
ates and defines the result. 


2.5. For Initial expressions 


The for initial expression, 


346 J. T. Feo 


for initial for initial 
<initialization> <initialization> 
while <test> repeat repeat 
<loop body> <loop body> 
returns <returns clause> until <test> 
end for returns <returns clause> 
end for 


permits loop carried dependencies, but retains single assignment se- 
mantics. It comprises four segments: initialization, test, loop body, 
and result clause. The initialization segment defines all loop constants 
and assigns initial values to all loop-carried names. It is the first iter- 
ation of the loop. 


The test may appear either before or after the body. Two forms are 
supported in Sisal: while <test> and until <test>. The loop body 
computes new values for all loop-carried names. An instance of the 
body may refer to any loop-carried name defined in the previous in- 
stance by prefixing the name with the keyword old. Thus, old a 
refers to the value of a on the previous iteration. Note that defining 
the value of a on the present iteration does not destroy the old value. 
The rebinding of loop-carried names to values is implicit and occurs 
between iterations. 


The returns clause of the for initial expression is identical in 
syntax and semantics to the returns clause of the for expression. 


As an example of the for initial expression, consider the func- 
tion first sum. Let x be real vector of length n, then y(i is 


i 
yi) = ¥ x), l<isn (2.2) 


While the elements of y are data independent, using a for expression 
would result in O(n2) additions. A more efficient algorithm can be had 
using a for initial expression. 


Sisal 347 


First, we rewrite Equation 2 as 


; x{1) i= 
HO a ape 2 (2.3) 


and then write Equation 2.3 in Sisal 


function first_sum(n: integer; x: OneR returns OneR) 
for initial 
is:=1; 
y := x{1) 
while i <n repeat 
i:= old i+ 1; 
y := old y + x[il 
returns array of y 
end for 


end function 


Again, the translation from mathematics to Sisal is straightforward. 
The how of the loop—allocating memory for the result and assigning 
values to positions—is implicit, implied by the semantics of the ex- 
pression. 


2.6. Array and stream operations 


Figures 3 and 4 list some of the array and stream operations in 
Sisal. It is important to remember that these, and all operations in 
Sisal, retain single-assignment semantics. The statement 


A[l: 0, 1, 2] 


does not replace the first, second, and third elements of A, as in an 
imperative language, but instead, creates a new array identical to A 
with the first, second, and third elements set to 0, 1, and 2. The 
function 


stream_first (A) 


348 J. T. Feo 


Array Operations Comments 


array OneR creates an empty array (type is required) 


array OneR [1: 1.0, 2.0, cama sabe pale eae with lower 
array fi11(1, n, 0) creates an array of Os, lower bound I, 
upper bound n 


A[l: 0, 1, 2] creates a new array idential to A with 
elements 1, 2, and 3 set to O, 1, and 2 


A lj B creates a new array which is the 
catena-tion of A and B 


Figure 3 - Array Operations 


Stream Operations Comments 
stream StrI [1, 2, 3, creates a stream of integer values (type 
is optional) 


stream_append(A, v) creates a new stream idential to A with 
the element v appended to the tail 


A lI B creates a new stream which is the 
cate-nation of A and B 


stream rest (A) creates a new stream identical to A with 
the first element removed 


stream first (A) returns the first element of A 


stream_empty (A) true if Ais empty and the producer has 
terminated 


Figure 4 - Stream Operations 


returns the first element of A but does not modify A {i.e., it does not 
remove the first element). To define a stream comprised of all the 
elements of A except the first element, the user must write 


Sisal 349 


B := stream_rest (A) 


stream_empty(A) returns true only if all of A is consumed and the 
producer of A has terminated. It does not return true simply because 
no value is available. Such behavior would introduce non-determinism 
and is not supported. Moreover, stream_first always returns a value. 
If the producer is slow and no value is available, the function waits for 
a value. If the stream is empty and the producer has terminated, an 
error value is returned. 


On close inspection of the array operations listed in Figure 3 and 
the loop structures described in the previous section one might con- 
clude that Sisal’s single-assignment semantics, and the copy and 
memory management operations implied by those semantics, would 
make Sisal inappropriate for scientific computing. However, studies 
show [1, 6] that compile-time analysis can eliminate virtually all un- 
necessary copy and memory management operations. In fact, Sisal’s 
conservative semantics and explicit array operations aid the analysis. 
We fully expect to achieve execution performance comparable to im- 
perative languages. 


3. Hamming’s Problem, Extended 
3.1. Understanding Hamming’s Problem 


Given an integer n and a set of primes {a, b,c, . . .}, generate in 
order and without duplicates all integers of the form 


ad bl ck ...en 


One way to solve Hamming’s Problem is to compute the cross product 
of the sets 


{a 1 i>0}, (W172 0}, {ck | k20O},... 


forming a set of tuples of the form 


350 J. T. Feo 


(al, bi, ck, ...) i,j, k>0 


Multiplying together the elements of each tuple, sorting the products 
in increasing order, and discarding the integers greater than n, solves 
Hamming’s Problem. 


For example, let n = 10 and the set of primes be (2, 3, 5}. First we 
form the three sets 


{1, 2, 4, 8}, {1, 3, 9], {1, 5) 


Note that we have discarded integers greater than 10. Next we form 
the cross product of the three sets 


( (1,1,2), (1,1,5), (1,3,1), (1.3.5), (1,9,), (19,5), (2.1.0), (2,1,5), 
(2,3,1), (2,3,5), (2,9,1), (2,9,5), (41,1), (4,1,5), (4,3,0), (4,3,5), 
(4,9,1), (4,9,5), (8,1,1), (8,1,5), (83,1), (8,3,5), (8,9,1), (8,9,5) 

} 


Multiplying together the elements of each tuple, sorting the products 
in increasing order, and discarding values greater than 10, yields the 
set 


{ 1, 2, 3, 4, 5, 6, 8, 9, 10 } 


Figure 5a shows a task graph to compute the powers of 2. It con- 
sists of a task and three edges: i, s, and b. The edges act as FIFO 
queues of zero or more values. The instruction set for the task is: 

1. remove the token on edge i, call its value x 
2. repeat 

3. output x on edge s 

4. output 2 times x on edge b 


5. remove the token on edge b, call its value x 


Sisal 351 


i edge s edge 


Figure 5a - Task graph to compute the powers of 2 


b edge b edge b edge 


v 


s edge s edge s edge s edge 


Figure 5b - Task graph to compute 2' 3! Bs 


18 15 
12 10 
oa] | 9 | 5 ig 
32 16 8 6 4 321 


Figure 5c - A snapshot of task graph 5b 


Figure 5b shows three instances of the task graph wired together. 
The tokens issued on the rightmost s edge are the solution to Ham- 
ming’s Problem for the set of primes (2, 3, 5}. The instruction set for 
the second and third task does require a slight modification. On each 
iteration, except the first, the tasks have a choice of removing a token 
from either of the incoming s or b edges. By removing the smaller of 
the two values, the tasks order the values output on the outgoing s 
edge. Figure 5c shows a snapshot of task graph in Figure 5b. 


352 J. T. Feo 


3.2. A Sisal solution 


In this section we explain how to express the task graph shown in 
Figure 5b in Sisal. We use a for initial expression and streams to 
spawn a task per prime and establish a FIFO queue between tasks i and 
i+ 1,i21. The Sisal code is 


for initial 

i := 0; 

s_ stream := stream[1]} 
while i < array size(primes) repeat 

i := old i + 1; 

s_stream := powers(n, primes[i], old s_stream) 
returns value of s_stream 


end for 


The initialization clause defines i and s_stream, an integer stream 
of one element. The loop body is executed iteratively, once for each 
prime. The body increments i and calls the function powers which 
takes three arguments: n, the ith prime, and old s_stream (the value 
of s_stream on the previous iteration). The value of the for initial 
expression is the value of s_stream defined on the last iteration (i.e., 
returned by the last call to powers). The for initial expres- 
sion invokes an instance of powers for each prime and establishes the 
FIFO queue s_stream between successive instances. If streams are 
implemented lazy, as is our intention, the instances will exist concur- 
rently exploiting the producer/consumer parallelism within the task 
graph. 


The function powers implements the task graph shown in Figure 
5a. At one time it was thought that Sisal could not express such cyclic 
computations, but this is not correct. Sisal can express cyclic compu- 
tations under three conditions: 1) the initial conditions of the input 
edges are known (i.e., the computation begins determinately), 2) the 
cyclic computation can be unrolled into a sequence of tasks, and 3) 


Sisal 353 


task output is synchronized with task input (i.e., a task can always wait 
for the next input before issuing the next output). The task graph in 
Figure 5a satisfies all three requirements. 


The Sisal code for powers is given in the Appendix. Each iteration 
of the for initial expression removes the smaller of the values (call 
it token) at the head of old s_streamand oldb stream, appends the 
token to the output stream, defines a new s_stream, and defines a new 
b_stream with token * prime appended at the end. Since the task may 
exhaust s_stream, a test for stream_empty is necessary. On the other 
hand, the task can never exhaust b_stream since it appends a new 
value to the stream each iteration. 


Notice how different the functional and imperative solution pro- 
cesses are. The functional programmer first specifies the computation 
logically and then translates the specification into code. He thinks in 
terms of expressions that, when executed, blossom into the needed 
task graphs. The scheduling of tasks, allocation of memory, and the 
synchronized access to shared data are implicit—the details are han- 
dled automatically by the compiler and runtime system. The impera- 
tive programmer might begin solving Hamming’s Problem by first 
defining FIFO queues, a set of queue operations, and an access proto- 
col for readers and writers. Then after writing code for the task bod- 
ies, he would explicitly wire together the tasks using the queues being 
careful not to violate the semantics of the operations or the access 
protocol. Insuring correctness is entirely his responsibility. The com- 
piler and runtime system provide little or no support. Since it is easy 
to introduce subtle time-dependent errors into imperative parallel 
programs, the programming process is difficult, frustrating, and error 
prone. 


354 J. T. Feo 


Mathene Ethane Propane Butane Iso-Butane 


b oo $90 oooo oboe 


Figure 6 - Paraffins of size 1, 2, 3, and 4 


Isomers 


pee OP. 


obo 


\ 


Duplicates 
soyeoydnqg 


/ 
a 
% 

\ 


Figure 7 - Isomers and duplicates 


4. The Paraffins Problem 
4.1. A-solution based on oriented trees 


A paraffin is a hydrocarbon molecule with the chemical formula 
CnHen+2. The Paraffins Problem asks us to output in increasing size the 
chemical structure of all paraffin molecules with n or fewer carbon 
atoms, including all isomers but no duplicates. Figure 6 show the par- 
affins of size 1, 2, 3, and 4. Since the placement of the carbon atoms 
uniquely defines the placement of the hydrogen atoms, we draw only 
the former. Isomers are different arrangements of the same number 
of carbon atoms. They have the same chemical formula, but different 
chemical properties. An isomer is a different arrangement of atoms, 
and not merely a rotation or reflection of a set of atoms (Figure 7). 


Sisal 355 


e@ 
Ore 
O-O-@ O+e@+0 


is Marea 


Figure 8 - Oriented trees of size 1, 2, 3, and 4 


In [8], Turner presents a functional solution to the Paraffins Pro- 
blem that first generates a list of paraffins and then filters out dupli- 
cates. Removing the duplicates is expensive and greatly increases the 
execution time of Turner’s solution. A more efficient functional algo- 
rithm exists based on oriented and free trees [3]. Since this algorithm 
generates no duplicates, it does not require any post-processing. 


An oriented tree is a connected, directed, acyclic graph. The tree 
includes a unique node called the root. All nodes other than the root 
are the source of a single arc, and there exists a unique path from ev- 
ery node to the root. The root is the source of no arc and is the sink 
of one or more arcs. The maximum number of arcs incident on any 
node is called the fan-in. Figure 8 shows the oriented trees of size 1, 
2, 3, and 4 with fan-in less than 4. We define the relation, <, on ori- 
ented trees as: let Ty be the j-th oriented tree of size i then 


Ty< Tr >li<k)v (iz kaj< JD. 


Careful inspection of Figure 8 yields an efficient dynamic program- 
ming algorithm for constructing oriented trees of size n with fan-in 
three from the trees of size less than n with fan-in three. Logically, 
the algorithm is: 


l. repeat for all choices of c, d, e, fg, and h 


356 J. T. Feo 


2. draw a root with fan-in three (call the three edges 
left, bottom, and right) 

3. choose three oriented trees Tyg, Tey, and Tgp, possi- 
bly of size zero, such that 


(c+et+tg=n-Na (Tea S$ Ter S Tgn) (4.1) 


4. attach Tog, Tes, and Tgp to the left, bottom, and right 
edges, respectively. 


Figure 9 illustrates how the algorithm constructs oriented trees of size 
four. 


Paraffins with n or fewer carbons can be built from oriented trees of 
size less than or equal to n/2. The carbon atoms of a paraffin molecule 
form a free tree—an acyclic, connected graph with undirected edges. 
The tree’s centroid is the node or nodes of minimum weight, where a 
node’s weight is the size of its largest subtree. An important feature of 
a free tree is that its centroid is unique. If the number of nodes in the 
tree is odd, the centroid is a single node; if the number of nodes is 
even, the centroid is either a single node or two adjacent nodes. This 
fact motivates an efficient parallel algorithm for constructing paraffins 
of size n from the oriented trees of size less than n/2. Logically, the 
algorithm is: 


Single Centroid: 


1. repeat for all choices of a, b,c, d,e, fg, andh 


2. draw a root with fan-in four (call the four edges top, 
left, bottom, and right) 


3. choose four oriented trees Tap, Tea, Te, and Tgh, 
possibly of size zero but less than n/2, such that 
(a+c+e+g=n-1)a (Tab S Tea S Teg S Tgh) (4.2) 
4. attach Tgp, Tea, Tes, and Tgp, to the top, left, bottom, 
and right edges, respectively. 


Sisal 357 


(0, 0, 3) (0, 0, 3) 


+>@<+0+0+0 +e 
f f 


(0, 1, 2) (1, 1, 1) 


+@<+0+0 O+@+0O 


Figure 9 - Constructing oriented trees of size 4 


Double Centroid: 
l. repeat for all choices of b and d 


2. choose two oriented trees Tep and Tea 
3. join together the roots of the two trees. 


Figure 10 illustrates how the algorithm would construct paraffins of 
size 6. 


The construction process must use oriented trees because it must 
consider each arrangement of nodes once per distinguishable set of 
nodes in the arrangement. For n = 3, there are two sets of distin- 
guishable nodes: the end nodes and the interior node. There are two 
oriented trees: one whose root is an end node, and one whose root is 
the interior node. Note that there is only one free tree of size 3 (pro- 
pane). Joining together the roots of the two oriented trees in all three 
ways—end node to end node, end node to interior node, and interior 
node to interior node—constructs the three isomers of size 6 with 
double centroid. 


358 J. T. Feo 


Single Centroid: | 


(0, 1, 2, 2) (1, 1, 1, 2) 


| 
O-@—O-0 O—-®—O—-O 


bo 


Double Centroid: 


O—O—® O—O—e— — 


| 
@—0-O 


Figure 10 - Constructing paraffins of size 6 


4.2. Constructing paraffins in Sisal 


Writing Sisal functions to implement the algorithms described in 
the previous section is straightforward. We represent both oriented 
trees and free trees (paraffins) as character strings; i.e., 


type trees = array([character] 


For the first five paraffins (Figure 6), our program generates the char- 
acter strings 


(C) ((C) (C)) (C(C) (C)) ((C(C)) (C(C))) (C(C) (C) (C)) 


If the string is an oriented tree, then the first C is the root of the tree 
and the parenthesized lists of Cs at the same level as the root are the 
subtrees. If the string is a paraffin with a single centroid, the first C is 
the centroid. If the string is a paraffin with a double centroid, then 


Sisal 359 


the string divides into two strings with equal number of Cs. The cen- 
troid is the first C in each half. 


We store the oriented trees and free trees (isomers) of size i as an 
array of trees, 


type TreeArrayl = array[tree] 


and the oriented trees and free trees (paraffin molecules) of size < n as 
an array of arrays of type tree, 


type TreeArray2 = array[TreeArrayl1] 


The main function Paraffins takes an integer argument, n, and re- 
turns one value of type TreeArray2. Since the number of isomers is 
different for different number of carbon atoms, the result array will be 
ragged. The body of the function is the let expression, 


let Trees := OrientedTrees(n/2) 
in for iani, n 
isomer := if mod(i, 2) = 1 then 


OneCentroid(i, Trees) 
else 
OneCentroid(i, Trees) {| 
TwoCentroid(i, Trees) 
end if 
returns array of isomers 
end for 


end let 


The let clause defines the oriented trees and the in clause builds the 
paraffins of size 1 through n. The result array is built in parallel and 
returned in order. 


The function OrientedTrees (see the Appendix) implements the 
dynamic programming algorithm described in the previous section. 
The function, a for initial expression, takes n as input and returns 


360 J. T. Feo 


the oriented trees of size [0 ... n/2]. The initialization clause defines 
the oriented trees of size 0, 1, and 2, while the i-th iteration of the 
body, 3 < i< n/2, builds the oriented trees of size i from the oriented 
trees of size less than i {i.e., old Trees), The trees are built by xTrees 
called from within the double-nested for expression 


for cin i, (i - 1)/3 cross 


einc, (i - 1 -c)/2 


We use the same names in the expression as in the logical description 
in the precious section. The ranges for c and e are set such that an in- 
stance of xTrees is invoked for each combination of tree sizes satisfy- 
ing Equation 4.1. Since the value of c and e define the value of g, no 
third loop is required. 


XTrees takes a, c, e, g, and Trees as input, and returns an array of 
trees. The function consists of three expressions (see Appendix). 
The first expression computes the cross product of the sets Trees[c] 
and Trees([a], returning a array of array of tree pairs. The (b, d)-th 
element of the array is the pair {Trees[c,d], Trees[a,b]}. The sec- 
ond expression computes the cross product of the sets Trees{e] and 
Trees[{g]. The third expression then takes the results of the first two 
expressions, computes their cross product, and builds the new trees. 


An important feature of the function is that it creates no duplicates. 
The caller guarantees that a<c<e<g; therefore, duplicates can oc- 
cur only if two adjacent parameters are equal. In which case, the 
cross product of their sets is symmetric. We can eliminate the sym- 
metry, and thus the duplicates, by structuring the cross products as 


(Trees[c] x Trees[a]) x (Trees[e] x Trees|[g]) 
and including the following tests: 


1. if c = a, then return only the lower triangle of the cross 
product (Figure lla); else, return the entire product. 


Sisal 361 


Trees[a] = 
Trees[c] 


Trees[e] = {x, y, z} 


x 
s 
& 


Trees[f] = {x, y, z} 


=, y, z} 


(x, x) (x,x) (x,y) (x,z) 


(y,*x) (y,y) (y,y) (y,z) 
(z,x) (z,y) (z,z) (z,z) 


(a) (b) 


(x,%,X,X%) (K,xX,x,y) (*,xX,%,2Z) 


(y,%-%,%) (y,x,X,y) (ys %s%,2) (ye, %,Ysy) (¥e%- YZ) 
(Ye¥eXe%) (YeyrXry) (ys ¥e%eZ) (yeYe¥ry) (YeYr¥eZ) 


(Z,%,X,X) (2Z,%,X,y) (Z,%,X,Z) (Z,%,yry) (2Z,%,ye%) (2,X,2Z,Z) 
(ZeVeXe%) (ZeyeXsy) (Zee X12) (ZeYe Vey) (ZeYeYrZ%) (ZsYe2%,Z) 
(Z,Z,X%_X) (2Z,2,X%,Y) (2,2,X%, Zz) (Zp Zeyry) (Zp Ze-YrZ) (2Z_~2,2z,Z) 


(c) 


Figure 11 - The results of the expressions in Xtrees fora=c=e=g 


2. ife = g, then return only the upper triangle of the cross 
product (Figure 11b); else, return the entire product. 


3. if c = e, then return only the lower triangle of the cross 
product (Figure 1lc); else, return the entire product. 


The function OneCentroid (see the Appendix) takes i and the array 
of oriented trees, and returns the free trees of size i with one cen- 
troid. The triple-nested range generator 


for ain 0, (i - 1)/4 cross 
c ina, (i - 1 - a)/3 cross 
e in max(c, i/2 -a-c), (i - 1 -a-c)/2 


creates an instance of the body (a call to xTrees) for all combinations 
of tree sizes satisfying Equation 4.2. Since the values of a, c, and e set 


362 J. T. Feo 


the value of g, no fourth loop is required. TwoCentroid (see the Ap- 
pendix) takes i and the array of oriented trees, and returns the free 
trees of size i with a double centroid. The double-nested range gen- 
erator 


for bin 1, array size(Trees[i/2]) cross 


din b, array size(Trees[i/2]) 


spawns an instance of the array build operation for every pair of ori- 
ented trees of size i/2. Both functions build their respective tree 
structures without duplicates and in order. 


Our solution relies heavily on the ragged structure of Sisal’s arrays 
and the semantics of the for expression’s range generator and re- 
turns clause. With careful thought and organization, we have been able 
to implement the complicated combinatorical requirements of the 
problem without generating duplicates. The Sisal solution is parallel, 
and executes in minimal space and time. 


5. The Doctor's Office 
5.1. A logical view of the Doctor's Office 


Given a list of patients and doctors model the following system. 
Originally, all patients are well and all doctors are available. Doctors 
await patients at their office in a FIFO queue. At random times, pa- 
tients become sick, travel to the doctor’s office, and wait in a FIFO 
queue to see a doctor. A nurse pairs the first patient and the first doc- 
tor in line, and assigns them an examination room. In the examination 
room, the doctor cures the patient in a random amount of time. The 
patient then rejoins the world and the doctor returns to the nurse’s 
station. Figure 12 shows a logical view of the Doctor’s Office com- 
prised of three tasks and four edges. The edges act as FIFO queues of 
zero or more values. 


Sisal 363 


As in Hamming’s Problem, we are faced with implementing a cyclic 
computation. Again we use a for initial expression and rely on the 
expression’s loop semantics to satisfy the cyclic dependencies. We 
use arrays, not streams, to implement the FIFO queues for reasons that 
will become apparent shortly. Unlike Hamming’s Problem, only one of 
the three conditions listed in Section 3.2 is satisfied—the initial con- 
dition of each edge is known. 


Patients_In - empty 
Patient_Out - the list of patients 
Doctor_Out - the list of doctors 


Patient_Doctor - empty 


The semantics of the for initial expression implies a sequence of 
tasks and synchronization of task input/output that violates the spirit, 
if not the letter, of the specification. Ideally, the three tasks should 
execute and communicate asynchronously without any constraints. 


Unfortunately, Sisal excludes all forms of asynchrony. Its functions 
are determinate. Outputs depend only on inputs regardless of archi- 
tecture, operating system, system load, or program state. The con- 
sumer of a stream cannot test the stream for data availability; if so, the 
consumer could be programmed to take different actions depending 
on whether or not data had arrived. The function’s outputs would 
then depend on the execution speed of the producer, the speed and 
congestion of the communication network, and the scheduling policy 
of the operation system. Once the consumer tries to read the next 
stream value, it must wait until that value arrives. To solve the 
Doctor’s Office to the letter of the specification, the three tasks must 
continue to execute whether or not new data arrives on all their input 
edges. That is, patients must become sick whether or not cured pa- 
tients return from the doctor's office, patient-doctors pairs must leave 
the nurse’s station whether or not new patients or doctors arrive, and 
patients must be cured whether or not new patient-doctors pairs be- 


364 J. T. Feo 


Patient_Out Patient_In Doctor_Out 


Patient_Doctor 


Well_Person Examintations 


Figure 12 - A logical view of the Doctor's Office Problem 


gin treatment. In Section 5.3 we address this dilemma and propose a 
reasonable solution. 


5.2. The main function 


We define two new types 


type queve = array([integer] ; 


type queue2 = array[array[integer]]; 
and write the main function as 


for initial 
seed := 0; 


patient_in := array queue []; 


patient_out list_of_patients; 


doctor_out := list_of_doctors; 


patient_doctor := array queue2 [] 


while true repeat 


seed := next_seed(old seed) ; 
patient_in := well person(seed, old patient_out); 
patient_doctor := nurse(old patient_in, old doctor_out); 


patient_out, doctor_out := examinations(seed, old patient_doctor) 


Sisal 365 


returns stream of patient in 
stream of doctor out 
end for 


end function 


patient_in, patient_out, and doctor_out are of type queue; and doc- 
tor_patient is of type queue2. The initialization segment defines the 
initial values of the four arrays and seed. The latter is used by 
well person and examinations to drive a random number generator. 
The three tasks in the body execute independently consuming the 
edge values defined on the previous iteration. The expression returns 
a stream of the patient queues, and a stream of the doctor queues. 


The expression as written is not correct. Notice that well_person 
takes as input old patient _out, the patient(s) cured on the previous 
iteration and who are now rejoining the array of well patients. All the 
patients that were well on the previous iteration and did not fall sick 
are lost. The state has not been retained. The well patients, the pa- 
tients and doctors waiting at the nurse’s station, and the patients and 
doctors in the examination rooms are persistent sets of data which are 
not created and consumed in a single action. Since Sisal functions are 
side-effect free and do not retain state between invocation, we must 
explicitly maintain and circulate the state from iteration to iteration. 


Thus well person must return two sets or arrays: the new sick pa- 
tients, and the array of patients that are still well. nurse must return 
three arrays: the new patient-doctor pairs, sick patients still waiting 
for doctors, and available doctors still waiting for patients. examina- 
tions must return three arrays: newly cured patients, newly available 
doctors, and patient-doctor pairs still in the examination rooms. 
Figure 13 shows the new edges (arrays) and the catenate operations 
necessary to reassemble state. The new main function is 


366 J. T. Feo 


for initial 


seed := 0; 

still_well := list_of_patients; 
patient_out := array queve []; 

still_sick := array queve []; 

patient_in := array queve []; 

still_available := list_of_doctors; 

doctor_out := array queue []; 

still_examining := array queue2 []; 
patient doctor := array queue2 [] 


while true repeat 


seed := next _seed(old seed) ; 


still_well, patient_in := 


well_person (seed, old still_well [| old patient_out); 


still_sick, patient_doctor, still_available := 
nurse(old still_sick I] old patient_in, 
old still_available [| old doctor_out) ; 


still_examining, patient_out, doctor_out := 


examinations(seed, old still_examining || old patient_doctor) 


returns stream of patient_in 
stream of doctor _out 


end for 


For simplicity, we have pushed the catenate operations into the pa- 
rameter lists of the functions. 


5.3. The three tasks 


Tasks of iteration i cannot execute until the tasks of iteration (i- 1) 
have completed. The loop-carried dependencies impose a constraint 
on task execution not specified in the problem description. As ex- 
plained in Section 5.1, the tasks of iteration (i - 1) must generate 
some output for the tasks of iteration i to execute; otherwise, those 


Sisal 367 


Doctor_Out 


HT] Examintations 


Still_Avaiiable 


| Well_Person 


Still_Well Patient_In Still_ Sick Patient_Doctor Still_Examining 


Patient_Out 


Figure 13 - The Sisal solution to the Doctor's Office Problem 


tasks will hang waiting for data. If we insist that a patient becomes 
sick or is cured every iteration, we would further violate the problem’s 
specifications. Instead, we force each task to issue either an empty 
array (a ghost) of the appropriate type, or a single-element array. Issu- 
ing single-element arrays is not a constraint imposed by the language. 
We could have written the tasks to issue any number, even a random 
number, of patients or patient-doctor pairs. Empty arrays are remov- 
ed automatically from the system by the catenate operations. 


The code for well_person is 


function well person (seed: integer; patients: queue 


returns queue, queue) 


let 
x := random(seed) ; 
size := array size(patients) ; 
sick := floor(real(size) * x / 0.7) +1 


if size = 0 | x >= 0.7 then 


patients, array queue [] 


368 J. T. Feo 


else 
array _remh(patients[sick: patients[size]]), 
array [1: patients[sick]] 
end if 
end let 


end function 


The let clause defines a random number x, computes the number of 
well patients, and the index of the patient who may fall sick (call him 
Bob). If there are no well patients (size = 0) or no patient falls sick (x 
>= 0.7), the function returns the input array patients and an empty 
array. Otherwise, the function removes Bob from the array of well 
patients by “replacing” his identification number with the identifica- 
tion number of the last person in the array, and then removing the last 
person. The resulting array is returned as the function’s first result. 
Bob's identification number is placed in an array and returned as the 
function’s second result. 


The code for nurse and examinations is similar. We refer the 
reader to the Appendix. 


While our solution to the Doctor’s Office is not perfect, it is close. 
Since Sisal explicitly excludes all forms of asynchrony, we could never 
hope to solve the problem exactly. The fact that we came as close as 
we did to modeling a real doctor’s office is a testimony to Sisal’s ro- 
bustness and generality. 


6.0. Skyline Matrix Problem 


6.1. Gaussian elimination without pivoting 


In this problem, we are asked to solve the linear system of equa- 
tions 


Sisal 369 


without pivoting where A is a skyline matrix. A skyline matrix has 
nonzero values in row iin columns k through i, 1 <k<i and nonzero 
values in column j in rows k through j, 1 < k <j. The values of k are 
stored in two vectors: row and column. Figure 14 depicts a skyline 
matrix and its associated row and column vectors. Notice how the 
nonzero values form a skyline both above and below the diagonal. 


A traditional solution method for linear systems of equations is 
Gaussian elimination without pivoting. The method reduces A to the 
triangular matrix A’, and b to the column vector b’ such that 


A’ x= b' (6.2) 


A’ can be either upper or lower triangular. Equation 6.2 is then solved 
using a method known as back substitution. 


The Sisal algorithm developed in Section 6.3 forms a lower triangu- 
lar matrix. The reduction occurs in n- 1 steps, one step per row. To 
eliminate the upper triangular matrix we step backwards from nto l. 
At step i, we reduce the system as follows 


Aj.k = Aj,k A (Aik * Aji / Aid ‘ 1 <j <i,lsksi (6.3.1) 
and 
b; = by - (b; bd Aj, / Ai, 1 <j <i (6.3.2) 


The elements in rows i through n, and in columns i+ 1 through n re- 
main unchanged. The reduction’s effect is to set the values in column 
i above the diagonal to zero, thereby, eliminating the ith column in 
the upper triangular matrix. Row i and column i are referred to as the 
pivot row and pivot column, respectively. A;,; and b; are referred to as 
the pivot element of A and the pivot element of b, respectively. On 
completion, A’ will consist of the n pivot rows, and b’ will consist of 
the n pivot elements of b. 


370 J. T. Feo 


row = [1, 2, 2, 4, 2, 4, 7] column = [{1, 2, 3, 1, 3, 2, 7] 


Figure 14 - A skyline matrix and associated row and column vectors 


To solve for x, we first solve for x; 
x1 = 5b) / AQ) (6.4.1) 
Once we have x), we can solve for x2 
x2 = (b’2 - A’21 X1)/ A’2,2 (6.4.2) 
and then for xg 
x3 = (b's - A’31 X1 - A’s,2 X2)/ A’s,3 (6.4.3) 


and so on. 


6.2. An efficient representation of a skyline matrix in Sisal 


The key to an efficient implementation of the Skyline Matrix 
Problem is eliminating the zeros at the beginning of each row and col- 
umn, and eliminating the computations involving those zeros. Sisal’s 
ragged array structure is ideally suited for this problem. Since a “two- 
dimensional” array in Sisal is an array of arrays, and since each com- 
ponent array can have a different size and lower bound, we can elimi- 


Sisal 371 


nate the zeros at the head of each row by setting the lower bound of 
component ito row[i]. Recall that row and column store the location 
of the first non-zero element in each row and column. Since the 
component arrays must be continuous, we can eliminate some, but not 
all, of the zeros above the diagonals by setting the upper bound of each 
component array to the index location of the last non-zero element in 
the row. For example, the zeros in columns 5 through 7 of the first 
row in the array shown in Figure 14 can be eliminated by setting the 
upper bound of row 1 to 4; however the zeros in column 2 and 3 re- 
main. 

We can eliminate all the zeros by splitting A into its lower and up- 
per triangular submatrices and transposing the upper. We refer to the 
former as L and the latter as U (even though it is really U-transpose). 
Under this decomposition, all leading zeros fall at the head of rows 
and can be eliminated by setting the lower bound of the tth compo- 
nent of L and u to row[i] and column[i], respectively. Figure 15 
shows such a decomposition. The Sisal code to build the two arrays is 


L := for iin 1, n cross j in row[i}], i 
returns array of A[i, j] 
end for; 
U := for iin 1, n cross j in column[i], i 
returns array of A[j, i] 


end for; 


Both expressions return a “two-dimensional” array of n rows. The 
lower bound of each row is the first value of the inner range, either 
row[i] or column{ij, and the upper bound is i; thus, each row stores 
only nonzero elements. The transpose in forming U is effected by re- 
versing i and j in the read. 


372 J. T. Feo 


Figure 15 - A skyline matrix decomposed into L and U 


6.3. An efficient algorithm based on the Sisal decomposition 


Having eliminated all the zeros that lie outside the skyline, we must 
now develop an efficient algorithm that takes L, U, and b, and returns 
A_prime and b prime. Eliminating the lower triangle of A would re- 
quire us to work with the columns of L and u. Since Sisal arrays are 
row-oriented, column-oriented algorithms are more complex and less 
efficient. Instead, if we eliminate the upper triangle of A, we will work 
with the rows of L and U, a much easier and more straightforward 
proposition. 


The following for initial expression implements the iterative 
algorithm described in Section 6.1, 


Sisal 373 


A_prime, b_prime := 


for initial 


i := nF 

pivot_b := bin]; 

pivot_A := Lin]; 

Ll, Ul, bl := reduce(n, L, U, b) 


while i> 1 repeat 


i := old i - 1; 
pivot_b := old bi[il; 
pivot_A := old L1[il; 
Ll, Ul, bl := reduce(i, old L, old U, old b) 


returns array of pivot_A 
array of pivot_b 


end for 


The expression steps backwards in single steps from nto 1. Each it- 
eration defines a new value of L1, U1, and b1 “reduce”d from the ar- 
rays’ previous values, and contributes one element to A_prime and 
b_prime. The contributed values are: L1[i], the ith pivot row, and 
b1[ij, the ith pivot element of b. 


A common mistake that novice Sisal programmers make is to build 
A_prime and b_prime in the body one element at a time, and carry the 
partial arrays from iteration to iteration. They append each iteration’s 
contribution to partial built arrays using array _addh, 


A_prime := array_addh(old A_prime, A_pivot); 


b_prime := array _addh(old b prime, b_pivot); 


This complicates the expression, and is not necessary. Novices fail to 
understand that Sisal expressions return values, including arrays, as a 
consequence of their execution. They continue to think imperatively, 
describing both the “what” and “how” of the computation. 


374 J. T. Feo 


The function reduce implements Equations 6.3.1 and 6.3.2. Re- 
writing the expressions in terms of L and U, we have 


Lyk = Lye - (Line Uiy / Lia (6.5.1) 
Uk = Uj k (Liy e Ui / Lia (6.5.2) 
by = by- (bie Uy / Li (6.5.3) 


where 1s j<tand1<k<j. We can express the three equations in 
Sisal as 


Ll := for j in 1, i- 1 cross k in 1, j 


returns array of 


old L1[j,k] - (old L1[i,k] * old Ul1[i,j] / old L1[(i,i]) 
end for; 
Ul := for j in 1, i - 1 cross k in 1, j 


returns array of 


old U1[j,k] - (old Li[i,j] * old U1[i,k] / old L1{i,iJ) 
end for; 
bl := for j inl, i-1 


returns array of 
old b1(j] - (old bili] * old U1[i,j3] / old Li[i,i)}) 


end for; 


Notice that L1, U1, and b1 are one element smaller than their old 
counterparts. Because old L1 and old U1 are compressed some ele- 
ments may be missing. Reading a missing element will return an er- 
ror value, which we can test for error using the intrinsic function is 
error. is error(x) returns true if x is an error value; else, it returns 


false. 


If old U1[i,4] is missing (i.e., the pivot column element is zero), 
Equations 6.5.1 and 6.5.3 reduce to 


Lik = Lik 
by = Db 


Sisal 375 


for 1 <k<j. If the value is present, then elements of the j-th row of L 
will change, but only elements from the minimum of the lower bounds 
of old L1[i] and old L1[4] to j will change. The other elements are 
zero and will remain zero. If old L1[i,4] is missing (i.e., the pivot 
row element is zero), Equation 6.5.2 reduces to 


Ujyk = Ujk 


for 1 <k<j. If the value is present, then elements of the j-th row of U 
will change, but only elements from the minimum of the lower bounds 
of old U1[i] and old U1[j] to j will change. The other elements are 
zero and will remain zero. The Appendix gives the revised Sisal ex- 
pression for L1, U1, and bi in function reduce. 


Once we have A_prime and b_prime, we can solve for x according to 
Equations 6.4. The Sisal code is straightforward and we leave it as an 
exercise for the reader (see the Appendix for our solution). 


Despite Sisal’s high-level functional semantics, we have developed 
an efficient solution of the Skyline Matrix Problem which stores only 
nonzero elements and avoids all computation involving zero elements. 
Moreover, at any time, the program’s data structures store only essen- 
tial information. The belief that functional languages are unable to ex- 
press scientific computations, and that array operations in these lan- 
guages are unnatural and inefficient is just not true. The Sisal code is 
more efficient, easier to understand, and certainly closer to the math- 
ematics of the problem than the Fortran solution presented in [2]. 


7. Conclusions 


In this chapter we have presented Sisal solutions to the four Sali- 
shan problems. All the solutions are parallel, and preliminary studies 
show that compile-time analysis [1, 6} can eliminate all unnecessary 
copying and memory management operations. We expect these solu- 
tions to execute as fast as imperative solutions on conventional multi- 


376 J. T. Feo 


processor systems. We were able to meet problem specifications in 
three of the four cases. In the odd case, the Doctor’s Office, the non- 
deterministic nature of the problem prevents us from implementing 
the problem exactly as specified. Given the havoc which unintended 
nondeterminism wreaks in the development of correct parallel pro- 
grams, we do not apologize for the “shortcoming” of determinacy in 
Sisal. 


We wish to leave the reader with three important facts. First, Sisal 
can express a wide variety of problems, not just scientific computa- 
tions. The language supports a robust set of array operations without 
violating functional semantics. 


Second, functional programming is more abstract than imperative 
programming. Since Sisal programs encode only the “what” and not 
the “how” of problem solutions, the Sisal programmer is not encum- 
bered by many tedious and picayune details. The mathematical seman- 
tics of Sisal provide the scientific programmer with a natural and fa- 
miliar medium in which to express his computations. We do not want 
to teach scientists yet another language; on the contrary, we want to 
return them to their mathematical roots. 


Third and most important, all four Sisal programs are parallel, de- 
terminate, and deadlock free. The codes will run on any computer 
system, regardless of topology or number of processors, in parallel and 
without rewriting. Yet at no time did we ever think about parallelism, 
communication, synchronization, or task scheduling. We simply de- 
signed a mathematical solution to the problem, and then translated it 
into Sisal code. Compare this to the imperative solutions presented 
in this book in which considerable time and effort is expended manag- 
ing parallelism, communication, synchronization, and task scheduling. 
Truly, writing parallel programs in a functional language is free. 


Sisal 377 


Acknowledgements 


I would like to thank Steven Skedzielewski who worked with me 
on designing the Sisal solutions for presentation at the 1988 Salishan 
Conference, and Tom DeBoni who proofread the chapter and suggest- 
ed numerous ways to improve the presentation of the algorithms. 


This work was supported {in part) by the Applied Mathematics 
Program of the Office of Energy Research (U.S. Department of Energy) 
under contract No. W-7405-Eng-48 to Lawrence Livermore National 
Laboratory. 


Disclaimer 


This report was prepared as an account of work sponsored by the United States 
Government. Neither the United States nor the United States Department of Energy, 
nor any of their employees, nor any of their contractors, subcontractors, or their em- 
ployees, makes any warranty, express or implied, or assumes any legal liability or re- 
sponsibility for the accuracy, completeness or usefulness of any information, appara- 
tus, product or process disclosed, or represents that its use would not infringe privately- 
owned rights. 


References to a company or product name does not imply approval or recommenda- 
tion of the product by the University of California or the United States Department of 


Energy to the exclusion of others that may be suitable. 


The views, opinions, and/or findings contained in this report are those of the au- 
thors and should not be construed as an official Department of the Army position, pol- 
icy, or decision, unless so designated by other documents. 


References 


1. Cann, D. C. Compilation Techniques for High Performance 
Applicative Computation. Ph.D. thesis, Department of Computer 
Science, Colorado State University, 1989. 


378 


J. T. Feo 


Eisenstat, S. C. and A. H. Sherman. Subroutines for envelope so- 
lution of sparse linear systems. Research Report 35, Yale 
University, New Haven, CT, October 1974. 


Knuth, D. The Art of Computer Programming, Vol. 1: Fundamental 
Algorithms. Addison-Wesley, Reading, MA, 1973. 


McGraw, J. R. et. al. Sisal: Streams and iterations in a single- 
assignment language, Language Reference Manual, Version 1.1. 
Lawrence Livermore National Laboratory Manual M-146, Lawrence 
Livermore National Laboratory, Livermore, CA, June 1983. 


McGraw, J. R. et. al. Sisal: Streams and iterations in a single- 
assignment language, Language Reference Manual, Version 1.2. 
Lawrence Livermore National Laboratory Manual M-146 (Rev. 1), 
Lawrence Livermore National Laboratory, Livermore, CA, March 
1985. 


Ranelletti, J. E. Graph Transformation Algorithms for Array 
Memory Optimization in Applicative Languages. Ph.D. thesis, 
Department of Computer Science, University of California at 
Davis/Livermore, 1987. 


Skedzielewski, S. K. and J. Glauert. IF1 - An intermediate form for 
applicative languages. Lawrence Livermore National Laboratory 
Manual M-170, Lawrence Livermore National Laboratory, 
Livermore, CA, July 1985. 


Turner, D. A. The semantic elegance of applicative languages. In 
Proceedings of the Conference on Functional Programming 
Languages and Computer Architecture, Portsmouth, NH, October 
1981, 85-92. 


Sisal 


Appendix 


[RRR IORI III IO IO II IK ICR I a 


[** Hamming's Problem, Extended we / 
[PRIOR RII IOI III IO IO IR IOI IO ICI / 


define hamming 


type OneDim = array{integer]; 
type Istream = stream[integer]}; 


function powers(n, prime: integer; in_stream: 


returns Istream) 


for initial 
token i= stream_first (in_stream) ; 
s_stream := stream_rest (in_stream) ; 
b_stream := stream [token * prime] 
while token < n repeat 
token, s_stream, b_ stream := 
let 


Istream 


s_ token := stream first (old s_stream); 
b_token := stream_first (old b_stream) 


in 
if stream_empty(old s_ stream) then 
b_token, 
old s_stream, 


379 


stream_append(stream_rest (old b_stream), b_token * prime) 


elseif b_token < s_token then 
b_token, 
old s_stream, 


stream_append(stream_rest (old b_stream), b_token * prime) 


else 
s_token, 
stream_rest (old s_ stream), 


stream_append(old b_ stream, s_token * prime) 


end if 
end let 
returns stream of token when token <= n 
end for 


end function % powers 


function hamming (n: integer; primes: OneDim 
returns Istream) 


for initial 
i := 0; 
s_stream := stream [1] 
while i < array size(primes) repeat 


380 J. T. Feo 


: 


i := old i +1; 

s_stream := powers(n, primes[i], old s_stream) 
returns value of s_stream 
end for 


end function % hamming 


[RRR I IK KKK III IK Kk kk KR IC 


/[** Paraffins Problem ae / 
[RII RI IK IK II KK IR III KICK 


define Paraffins 
type trees = array [character]; 


type TreeArrayl = array [trees]; 
type TreeArray2 = array [TreeArrayl]; 


function Xtrees(a, c, e, g: integer; Trees: TreeArray2 
returns TreeArrayl) 


CxA := if c = a then 
for d in 1, array_size(Trees[c]) cross 


bini, a 
returns array of Trees{c, da] || Trees[a, 
end for 


else 
for d in 1, array _size(Trees[c]) cross 
b in 1, array_size(Trees[a]) 
returns array of Trees[c, d] || Treesf{a, 
end for 
end if; 


ExG := if e = g then 
for f in 1, array_size(Trees[e]) cross 
h in f, array size (Trees[e]) 
returns array of Trees[e, f] {| Trees{g, 
end for 
else 
for f in 1, array _size(Trees[e]) cross 
h in 1, array_size(Trees[g]) 
returns array of Trees[e, f] || Trees(g, 
end for 
end if 


e then 

din i, array _size(CxA) cross 
b in 1, array size(CxA[d]) cross 
fini, d cross 


Sisal 


h in 1, array _size(ExG[f]) 
returns value of catenate 


array [1: "(C" {| CxA[d, b] || ExG{f, h] 
end for 
else 
for d in 1, array_size(CxA) cross 
b in 1, array_size(CxA[d]) cross 
f in 1, array _size(ExG) cross 


h in 1, array _size(ExG[f]) 
returns value of catenate 


array [1: "(C™ || CxA[d, b] {| ExG[f, h]} 


end for 
end if 
end let 


end function % Xtrees 


function OrientedTrees(n: integer returns TreeArray2) 


for initial 
i := 2; 
Trees := array [0: array [1: ""], 
array [1: "(C)"], 
array [1: "(C(C))")] 
while i <n repeat 
i := old i + 1; 
set_i := for c in 0, (i - 1) / 3 cross 
einc, (i-1-c) / 2 
g:=i-l-c-e 


returns value of catenate Xtrees(0, 


end for; 
Trees := array addh(old Trees, set_i) 
returns value of Trees 


end for 


end function % Oriented Trees 


function OneCentroid(i: integer; Trees: TreeArray2 


returns TreeArrayl) 


for ain0O, (i - 1) / 4 cross 
c ina, (i - 1 - a) / 3 cross 
e in max(c, i/2 -a-c), (i - 1 - a-o) 
g:=i-l-a-c-e 


returns value of catenate Xtrees(a, c, e, g, Trees) 


end for 


end function % OneCentroid 


function TwoCentroid(i: integer; Trees: TreeArray2 


returns TreeArrayl) 


1] 


e, 


") "j 


wyry 


Sr 


381 


old Trees) 


382 J. T. Feo 


for b in 1, array_size(Trees{i/2]) cross 


din b, array _size(Trees[i/2]) 
returns value of catenate 

array [1: "(" [| Trees[i/2, b] || 
end for 


end function % TwoCentroid 


Trees(i/2, dj] [1 ")"] 


function Paraffins(n: integer returns TreeArray2) 


let 
Trees := OrientedTrees(n / 2) 
ain 
for iindl, n 
isomer := if mod(i, 2) = 1 then 
OneCentroid(i, Trees) 
else 
OneCentroid(i, Trees) || 
end if 
returns array of isomer 
end for 
end let 


end function % Paraffins 


[RR III III KIKI IKK KR KK IIIT RK IK III | 


/** Doctor's Office Problem ae / 
[RII I RII IIT IKI II IOI TIC I I III KICK ICH 


define doctors_office 


type queue = array [integer]; 
type queue2 = array [array [integer]]; 


global random (seed: integer returns 


real) 


TwoCentroid(i, 


global next_seed (seed: integer returns integer) 


function well_persons (seed: integer; patients: queue 


returns queue, queue) 


let 

x := random(seed); 

size := array _size(patients); 

sick := floor(real(size) * x / 0.7) +1 
in 


if size = 0 | x >= 0.7 then 
patients, array queue [] 
else 


Trees) 


Sisal 


array _remh(patients[sick: patients[size]]), 
array [1: patients [sick)] 
end if 
end let 


end function % well_persons 


function nurse(patients: queue; doctors: queue 
returns queue, queue2, queue) 


let 
n_patients := array size(patients); 
n_doctors := array size(doctozs) 


if (n_patients = 0) | (n_doctors = 0) then 
patients, array queuve2 [], doctors 

else 
array setl (array _reml (patients), 1), 
array [1: array [1: patients[1], doctors[1)]]], 
array setl(array_reml(doctors), 1) 

end if 

end let 


end function % nurse 


function examinations (seed: integer; in_exam: queue2 
returns queue2, queue, queue) 


let 


x random(seed) ; 
size := array _limh(in_exam); 
cured := floor(real(size) * x / 0.3) +1 


i] 


in 
if size = 0 | x >= 0.3 then 
in_exam, array queue [], array queue [] 
else 
array_remh(in_exam[cured: in_exam[size]]), 
array [1: in_exam[cured, 1]], 
array [1: in_exam[cured, 2]] 
end if 
end let 


end function % examinations 


function doctors office (list_of_patients, list_of_doctors: 


returns stream[queue], stream[queue]) 


for initial 
seed := 0; 
still _well list_of_patients; 
patient_out array queue []; 


queue 


383 


384 J. T. Feo 


still_sick := array queue []; 
patient_in := array queue []; 
still_available := list_of_doctors; 
doctor_out := array queve []; 
still_examining := array queue2 []; 
patient_doctor := array queue2 [] 
while true repeat 
seed >= next_seed(old seed); 
still_well, patient_in := 
well persons(seed, old still_well {|| old patient_out); 
still_sick, patient_doctor, still_available := 
nurse (old still_sick || old patient_in, 
old still_available || old doctor_out); 


still_examining, patient_out, doctor _out := 
examinations(seed, old still_examining | 
returns stream of patient_in 
stream of doctor _out 
end for 


| old patient_doctor) 


end function % doctors_office 


[RII III KI IK IK KR IR RIK KR IK RR I I IK / 


/** Skyline Matrix Problem **/ 
[ORR RIK II KIRK RIK KKK RIK KIRK RICK IK | 


define skyline 


type OneDim = array[real]; 
type TwoDim = array([OneDim]; 
type IntDim = array(integer]; 


function form_skyline(n: integer; row, column: IntDim; A: TwoDim 
returns TwoDim, TwoDim) 


for iin 1, n cross 3} in row[i}, i 
returns array of A[i, 3] 
end for, 


for iin 1, n cross j in column[i], i 
returns array of A[j, i) 
end for 


end function % form_skyline 


function reduce(i: integer; Ll, U1: TwoDim; bl: OneDim 
returns TwoDim, TwoDim, OneDim) 


% reduce Ll 
for j in 1, i- 1 returns array of 


Sisal 385 


if is error(U1[i, 3]) then 
L1(3] 
else 
for k in min(array_liml(L1[i]), array _liml(L1[j])), 3 
returns array of 
if is error(L1[{i, k]) then 
L1ifj, k] 
elseif is error(L1i[j, k]) then 
= (L1 [i, k] * U1[i, 3) / Lifi, ij 
else 
L1(j, k) - (L1ifi, k) * U1[i, 3]) / Lili, i] 
end if 
end for 
end if 
end for, 


% reduce Ul 
for j in 1, i- 1 returns array of 
if is error(Ll[i, j]) then 
U1[3] 
else 
for k in min(array liml(U1[i]), array_liml(U1[j])), j 
returns array of 
if is error(U1{i, k]) then 
Ulij, kl 
elseif is error(U1[j, k]) then 
- (L1(i, 3) * Ul{i, k})) / L1fi, i] 
else 
U1I[j, k] - (LI[i, j] * Ulf{i, k]) / Lili, i] 
end if 
end for 
end if 
end for, 


% reduce bl 
for j in 1, i - 1 returns array of 
if is error(U1[i, 3]) then 
b1[3] 
else 
bl{j] - (b1l[i] * Ulfi, 3]) / L1li, i) 
end if 
end for 


end function % reduce 


function eliminate(n: integer; L, U: TwoDim; b: OneDim 
returns TwoDim, OneDim) 


for initial 
x := n; 
pivot_b b[n]; 


386 J. T. Feo 


pivot_A = L[n]; 

Ll, Ul, bl := reduce(n, L, U, b) 
while i > 1 repeat 

i := old i - 1; 

pivot_b := old bl[il; 

pivot_A := old L1[il; 

Ll, Ul, bl := reduce(i, old L1, old Ul, old bi) 
returns array of pivot_A 

array of pivot_b 

end for 


end function % eliminate 


function backsolve(n: integer; A_prime: TwoDim; b_prime: OneDim 
returns OneDim) 


for initial 


is:=n; 

joss 1; 

A := A_prime; 

b := b prime; 

x := b[n] / A[n, 1] 


while i> 1 repeat 
i:= old i- 1; 
j := old j + 1; 
b := for k inl, i 
b_k := if is error(A_prime[k, old j]) then 
old b[k] 
else 
old b[k] - old x * A_prime[k, old j] 
end if 
returns array of b_k 
end for; 
x := bfij] / Ali, 3] 
returns array of x 
end for 


end function % backsolve 
function skyline(n: integer; row, column: IntDim; 


A: TwoDim; b: OneDim 
returns OneDim) 


let 
L, U := form_skyline(n, row, column, A); 
A_prime, b_prime := eliminate(n, L, U, b) 
in 
backsolve(n, A_prime, b_prime) 
end let 


end function % skyline 


